Proceedings

Get Updates

Join our e-mail list:

MWSUG 2017 Paper Presentations

Paper presentations are the heart of a SAS users group meeting. MWSUG 2017 will feature dozens of paper presentations organized into 13 academic sections covering a variety of topics and experience levels.

Note: Content and schedule are subject to change. Last updated 10-Aug-2017.



BI / Customer Intelligence

Paper No. Author(s) Paper Title (click for abstract)
BI01 Craig Eckberg Creating the Impossible Report - Using SAS 9.4, OLAP Cube Studio 4.4, Information Map Studio 4.4, and Web Report Studio 4.4


Banking and Finance

Paper No. Author(s) Paper Title (click for abstract)
BF01 Samuel Berestizhevsky
& Tanya Kolosova
The Cox Hazard Model for Claims Data: Bayesian non-parametric approach
BF02 Chaoxian Cai Computing Risk Measures for Loan Facilities with Multiple Lines of Draws
BF03 Hairong Gu et al. Untangle Customer's Incrementality using Two-fold Uplift Modeling with a Case Study on Direct Marketing Campaign
BF04 Brent Whitesell Managing Security Issues for SAS in the Financial Industry an Explicit Example with the Guardium firewall.


Beyond the Basics SAS

Paper No. Author(s) Paper Title (click for abstract)
BB015 Art Carpenter Advanced Macro: Driving a Variable Parameter System with Metadata
BB042 Derek Morgan Demystifying Intervals
BB047 John Schmitz Extraction and Use of Text Strings with SAS when Source exceeds the 32k String Length Limit
BB049 Lingqun Liu SAS Programming Efficiency: the Basic Concepts and Tips
BB050 Matthew Nizol Efficient Fact Processing using Hash Tables and Bitwise Operations
BB071 Josh Horstman Fifteen Functions to Supercharge Your SAS Code
BB113 Ben Cochran You Did That With SAS? Combining Text with Graphics Output to Create Great Looking Reports.
BB114 Ben Cochran Tackling Unique Problems by Using TWO SET Statements in ONE DATA Step
BB124 Lynn Mullins
& Richann Watson
Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQL
BB125 Lynn Mullins One, Two, Three and You're ODS!! Three new Output Delivery System Procedures in SAS® 9.4: ODSLIST, ODSTABLE, and ODSTEXT
BB129-SA Jason Secosky DATA Step in SAS Viya: Essential New Features


Data Visualization and Graphics

Paper No. Author(s) Paper Title (click for abstract)
DV01 Kirk Paul Lafler An Introduction to ODS Statistical Graphics
DV02 Ilya Krivelevich et al. Waterfall Plots in Oncology Studies in the Case of Multi-Arms Design
DV04 Ting Sa A Macro that can Create U.S State and U.S County KML Files
DV07 Mia Lyst et al. A Big Data Challenge: Visualizing Social Media Trends about Cancer using SAS® Text Miner
DV08-SAS Cheryl Coyle Data Can Be Beautiful: Crafting a Compelling Story with SAS® Visual Analytics


Data for Good

Paper No. Author(s) Paper Title (click for abstract)
DG01 Brandy Sinco et al. Correlation and Structural Equation Analysis on the Effects of Anti-Discrimination Policies and Resources on the Well Being of Lesbian, Gay, and Bisexual College Students
DG02 Deanna Schreiber-Gregory Exploring the Relationship Between Substance Abuse and Dependence Disorders and Discharge Status: Results and Implications
DG03 David Corliss Lag Models with Social Response Outcomes
DG04 Andrea Frazier The (Higher) Power of SAS®
DG05 Julie Mullins
& Lynn Mullins
The Influence of Dark Triad Personality Traits on Prosocial Behavior


Hands-on Workshops

Paper No. Author(s) Paper Title (click for abstract)
HW01 Kirk Paul Lafler Hands-on Introduction to SAS® and the ODS Excel® Destination
HW02 Ben Cochran Using a Few SAS Functions to Clean Dirty Data
HW03 Kent Phelps
& Ronda Phelps
Base SAS® and SAS® Enterprise Guide® ~ Automate Your SAS World with Dynamic Code; Your Newest BFF (Best Friend Forever) in SAS
HW04 Ted Conway Hands-On with an Excel-Based Code Playground for Creating and Sharing SAS ODS Graphics
HW05 Chuck Kincaid Intermediate SAS® ODS Graphics


Pharmaceutical Applications

Paper No. Author(s) Paper Title (click for abstract)
PH01 Lingling Xie
& Xiaoqi Li
Challenges facing in Mapping MRI data to SDTM and ADaM
PH02 Derek Morgan ISO 8601 and SAS®: A Practical Approach
PH03 Patti Radke-Connell PATIENT SAFETY AND DISEASE PROFILING
PH04 Hao Sun et al. AIR Binder 2.0: A Dynamic Visualization, Data Analysis and Reporting SAS Application for Preclinical and Clinical ADME Assays, Pharmacokinetics, Metabolite Profiling and Identification
PH05 Richann Watson
& Josh Horstman
Automated Validation of Complex Clinical Trials Made Easy
PH06 Nancy Brucken
& Karin Lapann
ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Analysis Datasets
PH07 Keith Dunnigan Standard Two-Formulation Bioequivalence Testing


Rapid Fire

Paper No. Author(s) Paper Title (click for abstract)
RF01 Jayanth Iyengar Ignorance is not bliss - understanding SAS applications and product contents
RF02 Art Carpenter Quotes within Quotes: When Single (') and Double () Quotes are not Enough
RF03 Ming Yan No News Is Good News: A Smart Way to Impute Missing Clinical Trial Lab Data
RF05 Ting Sa Macro that can Provide More Information for your Character Variables
RF06 Aaron Barker Cleaning Messy Data: SAS Techniques to Homogenize Tax Payment Data
RF07 Louise Hadden PROC DOC III: Self-generating Codebooks Using SAS®
RF08 Nancy Brucken What Are Occurrence Flags Good For Anyway?


SAS 101

Paper No. Author(s) Paper Title (click for abstract)
SA01 Kirk Paul Lafler An Introduction to PROC REPORT
SA02 Jayanth Iyengar If you need these OBS and these VARS, then drop IF, and keep WHERE
SA03 Derek Morgan The Essentials of SAS® Dates and Times
SA04 Derek Morgan PROC SORT (then and) NOW
SA05 Haiyin Liu
& Wei Ai
Working with Datetime Variable from Stata
SA06 Josh Horstman Merge with Caution: How to Avoid Common Problems when Combining SAS Datasets
SA07 Josh Horstman Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS® Code
SA09 Andrew Kuligowski Parsing Useful Data Out of Unusual Formats Using SAS®
SA10 Andrew Kuligowski The Building Blocks of SAS® Datasets - S-M-U (Set, Merge, and Update)
SA11 Art Carpenter Before You Get Started: A Macro Language Preview in Three Parts
SA12 Joe Matise Writing Code With Your Data: Basics of Data-Driven Programming Techniques
SA13 Ben Cochran Make That Report Look Great Using the Versatile PROC TABULATE


Statistics / Advanced Analytics

Paper No. Author(s) Paper Title (click for abstract)
AA01 Brandy Sinco et al. Using SAS to Compare Two Estimation Methods on the Same Outcome: Example from First Trimester Pregnancy Weights
AA02 Bruce Lund Logistic Model Selection with SAS® PROC's LOGISTIC, HPLOGISTIC, HPGENSELECT
AA04 Mei Najim Claim Analytics
AA05 Robin High Unconventional Statistical Models with the NLMIXED Procedure
AA06 Jia Li
& Aaron Sussell
Multiple Imputation of Family Income Data in the 2015 Behavioral Risk Factor Surveillance System
AA07 Chad Cogan et al. GREMOVE, Reassign, and let's GMAP! A SAS Trick for Generating Contiguous Map Boundaries for Market-Level Research
AA08 Shana Kelly Correcting for Selection Bias in a Clinical Trial
AA11 Mohsen Asghari et al. Dimensionality Reduction using Hadamard, Discrete Cosine and Discrete Fourier Transforms in SAS
AA12 Andrea Frazier Nothing to SNF At: Evaluating an intervention to reduce skilled nursing home (SNF) length of stay
AA13 Logan Edmonds How Can an NBA Player Be Clutch?: A Logistic Regression Analysis
AA14 Deanna Schreiber-Gregory Multicollinearity: What Is It and What Can We Do About It?
AA15 Karthik Sripathi OSCARS 2017 - TEXT MINING & SENTIMENTAL ANALYSIS
AA16 Vaibhav Vanamala Text and Sentiment Analysis of customer tweets of Nokia using SAS® Enterprise Miner and SAS® Sentiment Analysis Studio
AA17 Vasudev Sharma Tornado Inflicted Damages Pattern
AA18 Alan Dybing Agricultural Trip Generation - Linking Spatial Data and Travel Demand Modeling using SAS
AA19-SAS Mike Patetta Getting Started with Multilevel Modeling
AA20-SAS John Castelloe Power and Sample Size Computations
AA99 Tho Nguyen
& Paul Segal
Tips and Best Practices Using SAS® in the Analytical Data Life Cycle


System Architecture and Administration

Paper No. Author(s) Paper Title (click for abstract)
SY01 Bob Matsey Using Agile Analytics for Data Discovery
SY02 Piyush Singh et al. Read SAS Metadata Content in SAS Enterprise Guide


Tools of the Trade

Paper No. Author(s) Paper Title (click for abstract)
TT01 Jack Shoemaker Generating Reliable Population Rates Using SAS® Software
TT02 Richann Watson Check Please: An Automated Approach to Log Checking
TT03 Misty Johnson Arbovirus, Varicella and More: Using SAS® for Reconciliation of Disease Counts
TT04 Paul Kaefer Code Like It Matters: Writing Code That's Readable and Shareable
TT05 Matthew Nizol Quality Assurance Strategies for Analytic Code
TT06 Laurie Bishop From Device Text Data to a Quality Dataset
TT07 Doug Zirbel Proc Transpose Cookbook
TT08 Louise Hadden Get Smart! Eliminate Kaos and Stay in Control - Creating a Complex Directory Structure with the DLCREATEDIR Statement, SAS® Macro Language, and Control Tables
TT09 Lakhpreet Gill An array of possibilities: using arrays to manipulate longitudinal survey data
TT10 David Oesper Fully Automated Updating of Arbitrarily Complex Excel Workbooks


e-Posters

Paper No. Author(s) Paper Title (click for abstract)
PO01 Louise Hadden Red Rover, Red Rover, Send Data Right Over: Exploring External Geographic Data Sources with SAS®
PO02 Louise Hadden SAS/GRAPH® and GfK Maps: a Subject Matter Expert Winning Combination
PO03 Lakshmi Nirmala Bavirisetty
& Deanna Schreiber-Gregory
Data Quality Control: Using High Performance Binning to Prevent Information Loss
PO04 Hanadi Ajam Oughli et al. Clinical and genetic biomarkers as moderators of fat gain in antipsychotic treated older adults
PO05 Lynn Mullins
& Richann Watson
Let's Get FREQy with our Statistics: Let SAS® Determine the Appropriate Test Statistic Based on Your Data




Abstracts

BI / Customer Intelligence

BI01 : Creating the Impossible Report - Using SAS 9.4, OLAP Cube Studio 4.4, Information Map Studio 4.4, and Web Report Studio 4.4
Craig Eckberg, UnitedHealth Group

SAS has served our organization well as a conduit to bring disparate data together across a variety of platforms. While our user community has grown accustomed to a self-service model of do-it-yourself reporting from the vast set of libraries our system offers, we have been light on any business intelligence offerings. One of the groups our system supports is within UHC's Community & State business segment. They approached us about automating and enhancing a monthly Product Category Allocation report that team produces for their executive leadership. In its current form, it was a cube extract mixed with manual copy and paste and adjustments. It was time-consuming and prone to error. They were looking for a better way. That better way required more than just a join of a few datasets with a couple of prompts and a nice formatted report. They wanted the ability to expand and contract account trees, have different levels within the account tree be lined up on the same set of columns of a report. They wanted three different sections each with a different data type to track: 1) Revenue/Expense/Gross Margin/Membership along with 2) Percentages based on the first section and 3) Per Member Per Month (PMPM) amounts based on the first section. They wanted to be able to adjust the allocation in the reporting either by a set of percentages in a driver file or to be calculated based on either the revenue or membership of the Product Category distribution. In short, they wanted the Impossible Report.


Banking and Finance

BF01 : The Cox Hazard Model for Claims Data: Bayesian non-parametric approach
Samuel Berestizhevsky, Consultant
Tanya Kolosova, Co-author

The central piece of claim management is claims modeling. Two strategies are commonly used by insurers to analyze claims: the two-part approach that decomposes claims cost into frequency and severity components, and the pure premium approach that uses the Tweedie distribution. In this article, we provide a general framework to look into the process of modeling of claims using Cox Hazard Model. The Cox proportional hazard (PH) model is a standard tool in survival analysis for studying the dependence of a hazard rate on covariates and time. This article is a case study intended to indicate a possible application of Cox PH model to workers' compensation insurance, particularly occurrence of claims (disregarding claims size). In our study, the claims data is from workers' compensation insurance in selected industries and states in the United States for the period of 2 years from November 01, 2014 till October 31, 2016. We present application of the Bayesian approach to survival analysis (time-to-event analysis) that allows dealing with violations of assumptions of Cox PH model. SAS 9.2, any operating system, no requirements to the skill level or statistical or machine-learning background.


BF02 : Computing Risk Measures for Loan Facilities with Multiple Lines of Draws
Chaoxian Cai, BMO Harris Bank

In commercial lending, a commitment facility may have multiple lines of draws with hierarchical loan structures. The risk measures for main, limit, and sublimit commitments are usually aggregated and reported at the main obligation level. Thus finding all hierarchical loan structures in loan and commitment tables is required in order to aggregate the risk measures. In this paper, I will give a brief introduction of commercial loans from simple standalone loans to revolving and non-revolving commitments with complex loan structures. I will present a SAS macro program, which can be used to identify main obligations and loan structures of future commitments from loan and commitment relational tables using Base SAS data steps. Risk measures, such as exposure at default (EAD) and credit conversion factor (CCF), are computed for these complicated loans and illustrated by examples.


BF03 : Untangle Customer's Incrementality using Two-fold Uplift Modeling with a Case Study on Direct Marketing Campaign
Hairong Gu, Alliance Data
Yi Cao, Alliance Data
Chao Xu, Alliance Data

It is well known that uplift modeling, which directly models the incremental impact of a marketing promotion on consumer's behavior, helps marketers to identify and target these persuadable customers whose propensities of response are driven by promotion. However, in reality, some of those "persuadables" are promotion chasers, meaning that they tend to exploit the offer but not contribute to bottom line sales. In this paper, we propose a two-fold uplift model - modeling both incremental response rate and incremental sales. Using this model, promotion riders in "persuadables" can be further identified and customers who can bring maximal incremental campaign ROI will surface. We demonstrate this two-fold uplift model through a case study on a real-world direct marketing campaign. The goal of this study is pinpoint a roadmap to build useful uplift models with both technical and business acumen to bolster the success of marketing campaigns.


BF04 : Managing Security Issues for SAS in the Financial Industry an Explicit Example with the Guardium firewall.
Brent Whitesell, Commerce Bank

The financial industry has both a high level of regulation and a high level of risk in regards to the protection of sensitive customer information. Providing ready access to users and strong protection at the same time can often be problematic. This presentation discusses the need to properly configure security by discussing a challenge where the SAS server was not showing up in the Guardium firewall repository and administrators were not able to perform a ls -l command on the logs directory to return the listing that was guarded. The use of several ping commands and the health checks that the Guardium agent installs were used to isolate the issue. Once it was determined that the administrators were not able to ping out to the default gateway and other servers the Security team checked the firewall repository to clarify that our SAS server was in that listing. Once the firewall listing was updated all the ls commands worked as expected and the scheduled jobs all ran on-time.


Beyond the Basics SAS

BB015 : Advanced Macro: Driving a Variable Parameter System with Metadata
Art Carpenter, CA Occidental Consultants

When faced with generating a series of reports, graphs, and charts; we will often use the macro language to simplify the process. Commonly we will write a series of generalized macros, each with the capability of creating a variety of outputs that depend the macro parameter inputs. For large projects, potentially with hundreds of outputs, controlling the macro calls can itself become difficult. The use of control files (metadata) to organize and process when a single macro is to be executed multiple times was discussed by Rosenbloom and Carpenter (2015). But those techniques only partially help us when multiple macros, each with its own set of parameters, are to be called. This paper discusses a technique that allows you to control the order of macro calls along with each macro's parameter set, while using a metadata control file.


BB042 : Demystifying Intervals
Derek Morgan, PAREXEL

Intervals have been a feature of base SAS for a long time, allowing SAS users to work with commonly (and not-so-commonly) defined periods of time such as years, months, and quarters. With the release of SAS 9, there are more options and capabilities for intervals and their functions. This paper will first discuss the basics of intervals in detail, and then we will discuss several of the enhancements to the interval feature, such as the ability to select how the INTCK() function defines interval boundaries and the ability to create your own custom intervals beyond multipliers and shift operators.


BB047 : Extraction and Use of Text Strings with SAS when Source exceeds the 32k String Length Limit
John Schmitz, Luminare Data

Database Systems support text fields that can be much larger than what are supported by SAS® and SAS/ACCESS® systems. These fields may contain notes, unparsed and unformatted text, or XML data. This paper offers a solution for lengthy text data. Using SQL explicit pass-through, minimal native SQL coding, and some SAS macro logic, SAS developers can easily extract and store these strings as a set of substring elements that SAS can process and store without data loss.


BB049 : SAS Programming Efficiency: the Basic Concepts and Tips
Lingqun Liu, University of Michigan

This paper introduces the basic SAS programming efficiency concepts and tips. The first section explains what SAS programming efficiency is in terms of computer resources and human resources. Some SAS utilities are also presented to help gauge the performance of SAS processing. The second section covers the most basic and frequently used SAS technical tips to help SAS developers and programmers improve the performance of their SAS applications. The last section includes some real world examples to demonstrate the benefits some technical tips can bring forth. Keywords: efficiency; resources; computer resources: CPU I/O, memory, storage; human resources; trade-offs; technical Tips; simplicity;


BB050 : Efficient Fact Processing using Hash Tables and Bitwise Operations
Matthew Nizol, ArborMetrix

Quality measures for healthcare research are often defined in terms of a series of inclusion and exclusion rules that determine the cohort of patients included in the measure. Frequently, these inclusion and exclusion rules can be expressed as conjunctions or disjunctions of Boolean-valued statements, or facts, about a patient or an encounter. While in general facts may be defined via any logical expression, measure definitions commonly include one or more facts defined by a list of diagnosis or procedure codes. For a single measure, utilizing standard table joins or SAS® formats to check whether a particular diagnosis or procedure code maps to a fact is an adequate strategy. However, when computing tens or hundreds of measures whose underlying fact definitions overlap, traditional lookup methods do not scale. This paper discusses an efficient method for fact processing that utilizes hash tables and bitwise operations such that the facts about a patient or encounter can be computed in a single pass over the data. Specifically, the method associates each diagnosis or procedure code for an encounter with a bit string whose set bits indicate fact definitions that include that code. In this manner, each code need only be looked up in a hash table once per encounter record, rather than once per fact per encounter. Bitwise operations combine the bit strings across the records for the same encounter. The algorithm has O(N) runtime complexity and is more efficient in practice than alternative methods based on either SQL joins or SAS formats.


BB071 : Fifteen Functions to Supercharge Your SAS Code
Josh Horstman, Nested Loop Consulting

The number of functions included in SAS software has exploded in recent versions, but many of the most amazing and useful functions remain relatively unknown. This paper will discuss such functions and provide examples of their use. Almost any SAS programmer should find something new to add to their toolbox.


BB113 : You Did That With SAS? Combining Text with Graphics Output to Create Great Looking Reports.
Ben Cochran, The Bedford Group

Using PROC DOCUMENT and other procedures and methods from the SAS ODS package enables a SAS user to create fantastic reports. And since ODS is a part of Base SAS, the DOCUMENT procedure is, likewise, a part of Base SAS. It can do many things to enhance output. The main thrust of this presentation is to illustrate combining a Text file with SAS Procedure (GPLOT and REPORT) output. This report will then be used to create a PDF file. This paper shows how to go beyond the limitations of PROC DOCUMENT to deliver the complete package.


BB114 : Tackling Unique Problems by Using TWO SET Statements in ONE DATA Step
Ben Cochran, The Bedford Group

This paper illustrates solving many problems by creatively using TWO SET statements in ONE DATA step. Calculating percentages, Conditional Merging, Conditionally using Indexes, Table Lookups and Look ahead operations are investigated in this paper.


BB124 : Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQL
Lynn Mullins, PPD
Richann Watson, Experis

There are often times when programmers need to merge multiple SAS® data sets to combine data into one single source data set. Like many other processes, there are various techniques to accomplish this using SASsoftware. The most efficient method to use based on varying assumptions will be explored in this paper. We will describe the differences, advantages and disadvantages, and display benchmarks of using HASH tables, the SORT and DATA step procedures, and the SQL procedure.


BB125 : One, Two, Three and You're ODS!! Three new Output Delivery System Procedures in SAS® 9.4: ODSLIST, ODSTABLE, and ODSTEXT
Lynn Mullins, PPD

The Output Delivery System (ODS) Report Writing Interface (RWI) enables you to create and manipulate predefined ODS objects in a DATA step to create highly customized output. SAS 9.4 includes three new ODS procedures: " PROC ODSLIST " PROC ODSTABLE " PROC ODSTEXT These new ODS procedures allows for the creation of specific types of outputs, such as, text content rather than tables, table templates and bind them with the input data set in one statement, and text content rather than the usual SAS output. This paper will discuss these three new ODS procedures from SAS 9.4 and give examples of how to use them with real life applications.


BB129-SA : DATA Step in SAS Viya: Essential New Features
Jason Secosky, SAS

The DATA step is the familiar and powerful data processing language in SAS® and now SAS Viya. The DATA step's simple syntax provides row-at-a-time operations to edit, restructure, and combine data. New to the DATA step in SAS Viya are a varying-size character data type and parallel execution. Varying-size character data enables intuitive string operations that go beyond the 32KB limit of current DATA step operations. Parallel execution speeds the processing of big data by starting the DATA step on multiple machines and dividing data processing among threads on these machines. To avoid multi-threaded programming errors, the run-time environment for the DATA step is presented along with potential programming pitfalls. Come see how the DATA step in SAS Viya makes your data processing simpler and faster.


Data Visualization and Graphics

DV01 : An Introduction to ODS Statistical Graphics
Kirk Paul Lafler, Software Intelligence Corporation

Delivering timely and quality looking reports, graphs and information to management, end users, and customers is essential. This presentation provides SAS® users with an introduction to ODS Statistical Graphics found in the Base-SAS software. Attendees learn basic concepts, features and applications of ODS statistical graphic procedures to create high-quality, production-ready output; an introduction to the statistical graphic SGPLOT procedure, SGPANEL procedure, and SGSCATTER procedure; and an illustration of plots and graphs including histograms, vertical and horizontal bar charts, scatter plots, bubble plots, vector plots, and waterfall charts.


DV02 : Waterfall Plots in Oncology Studies in the Case of Multi-Arms Design
Ilya Krivelevich, Eisai Inc
Kalgi Mody, Eisai Inc
Simon Lin, Eisai Inc

Clinical data are easier to understand when presented in a visual format. In Oncology, in addition to the commonly used survival curves, other types of graphics can be helpful in describing response in a study. These plots are becoming more and more popular due to their easy-to-understand representation of data. Waterfall plots can help to visualize tumor shrinkage or growth; in such plots, each patient in the study is presented by a vertical bar on the plot and each bar represents the maximum change in the measurement of tumors. In the studies with two arms, waterfall plots are often used to compare the outcome between arms. The excellent ground for understanding waterfall plots is proposed in the article of Theresa W. Gillespie, PhD, MA, RN: Understanding Waterfall Plots, Journal of the Advanced Practitioner in Oncology, 2012 Mar-Apr. This article claims that A study using a randomization scheme other than 1:1 will not lend itself as well to a waterfall plot technique. As stated previously, since each vertical plot represents a single patient, waterfall plots limit the ability to portray different randomization schemes, e.g., 2:1 or 3:1. This presentation shows how we can solve this problem by new techniques, using PROC SGPANEL and Graph Template Language.


DV04 : A Macro that can Create U.S State and U.S County KML Files
Ting Sa, Cincinnati Children's Hospital Medical Center

In this paper, a macro is introduced that can generate the KML files for the U.S states and U.S counties. For the generated KML files, they can be used directly by the Google maps to add customized state and county layers on it with user defined colors and transparencies. When the state and county layers are clicked on the Google maps, customized information will show up. To use the macro, the user only need to prepare a simple input SAS data set. The paper will include all the SAS codes for the macro and provide examples to show you how to use it as well as show you how to display the KML files on the Google map.


DV07 : A Big Data Challenge: Visualizing Social Media Trends about Cancer using SAS® Text Miner
Mia Lyst, Pinnacle Solutions, Inc
Scott Koval, Pinnacle Solutions, Inc
Yijie Li, Pinnacle Solutions, Inc.

Analyzing big data and visualizing trends in social media is a challenge that many companies face as large sources of publically available data become accessible. While the sheer size of usable data can be staggering, knowing how to find trends in unstructured textual data is just as important of an issue. At a Big Data conference, data scientists from several companies were invited to participate in tackling this challenge by identifying trends in cancer using unstructured data from Twitter users and presenting their results. This paper explains how our approach using SAS analytical methods was superior to other Big Data approaches in investigating these trends.


DV08-SAS : Data Can Be Beautiful: Crafting a Compelling Story with SAS® Visual Analytics
Cheryl Coyle, SAS

Do your reports effectively communicate the message you intended? Are your reports aesthetically pleasing? An attractive report does not ensure the accurate delivery of a data story, nor does a logical data story guarantee visual appeal. This paper provides guidance for SAS® Visual Analytics Designer users to facilitate the creation of compelling data stories. The primary goal of a report is to enable readers to quickly and easily get answers to their questions. Achievement of this goal is strongly influenced by choice of visualizations for the data being shown, quantity and arrangement of the information included, and the use, or misuse, of color. This paper describes how to guide readers' movement through a report to support comprehension of the data story; provides tips on how to express quantitative data using the most appropriate graphs; suggests ways to organize content through the use of visual and interaction design techniques; and instructs report designers on color meaning, presenting the notion that even subtle changes in color can evoke different feelings than those intended. A thoughtfully designed report can educate the viewer without compromising visual appeal. Included in this paper are recommendations and examples which, when applied to your own work, will help you create reports that are both informative and beautiful.


Data for Good

DG01 : Correlation and Structural Equation Analysis on the Effects of Anti-Discrimination Policies and Resources on the Well Being of Lesbian, Gay, and Bisexual College Students
Brandy Sinco, University of Michigan
Michael Woodford, Wilfred Laurier University, Ontario, CA
Jun Sung Hong, Wayne State University
Jill Chonody, Indiana University

Methods: Among a convenience sample of cisgender LGBQ college students (n=268), we examined the association between college- and state-level structural factors and students' experiences of campus hostility and microaggressions, psychological distress, and self-acceptance. Relationships between these outcomes were first examined with Spearman correlation coefficients. Structural Equation Modeling (SEM) was used to explore the meditating relationship of college-level structural factors on discrimination, distress, and self-acceptance. SAS Proc Corr was used for the correlation analysis and Proc CALIS was used for the SEM. The EffPart feature in Proc CALIS was used to test for a mediating effect from an inclusive non-discrimination policy to (hostility and microaggressions) to psychological distress. Results: State-level factors were not correlated with students' experiences nor psychological well being. Both the correlation matrix and SEM results suggested positive benefits from select college policies and resources, particularly non-discrimination policies that include both gender identity and sexual orientation (versus only sexual orientation). Based on the SEM and correlation matrix, a non-discrimination policy that included both sexual orientation and gender identity was significantly associated with lower microaggressions and overt hostility, p<.05. Higher LGBTQ student organization to study body ratios were also significantly associated with reduced microaggressions and hostility, in addition to lower stress and anxiety. The SEM model indices indicated good absolute fit, incremental fit, parsimony, and predictive ability with CFI>.95, along with RMSEA and SRMR<.05. Conclusion: An inclusive non-discrimination policy, that includes transgender students, also provides a healthier college environment for cisgender students.


DG02 : Exploring the Relationship Between Substance Abuse and Dependence Disorders and Discharge Status: Results and Implications
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

The goal of this study was to investigate the association between substance abuse and dependence diagnoses and discharge status for patients admitted to a short-term acute care facility in the United States while controlling for gender, age, marital status, region, admission type, primary form of payment, days of care, and race. A series of univariate and multivariate logistic regression analyses as well as a propensity analysis were conducted via SAS®9.4 to explore the association of the target variable, a primary diagnosis of substance abuse or dependence, with the treatment variable, discharge status, and identified control variables among patients who were admitted to a short-term acute care facility in the United States. The results revealed a significant relationship between having a primary substance abuse or dependence diagnosis and discharge status while controlling for discharge status propensity and possible confounding variables. Significant and non-significant odds ratio effects are provided and reviewed. Results supported that patients with a primary diagnosis of substance abuse or dependence differ significantly in terms of resulting discharge status than the rest of the patient population. Limitations and strengths of the data set used are discussed and the effects of these limitations and strengths on the power and results of this model are reviewed. This paper is for any level of SAS user with an interested in the statistical evaluation of mental health care in acute care facilities.


DG03 : Lag Models with Social Response Outcomes
David Corliss, Peace-Work

Lag models are a type of time series analysis where the current value of an outcome variable is modeled based, at least in part, on previous values of predictor variables. This creates new opportunities for the use of social media data, both as the result of previous events and as predictors of future outcomes. This paper demonstrates lag models with social media data to establish a connection between severe solar storms and subsequent hardware failures based on complaints recorded in Twitter. The methodology is then used to investigate the possibility of a statistical link between hate speech and subsequent acts of violence against persons targeted by the speech.


DG04 : The (Higher) Power of SAS®
Andrea Frazier, Presence Health

Are there spiritual benefits to using SAS® ? One synagogue in Chicago thinks so--and has its own all-volunteer informatics committee! The flexibility of SAS® software is well-known for business applications, but SAS® can also be used to improve membership data and evaluate programming, leading to a better congregant and community experience.


DG05 : The Influence of Dark Triad Personality Traits on Prosocial Behavior
Julie Mullins, Saint Louis University
Lynn Mullins, PPD

The personality traits of the Dark Triadnarcissism, psychopathy, and Machiavellianismseem to be present in a person when there is a deficit in empathy. Previous research suggested that those who exhibit traits of the Dark Triad will be motivated by the desire to be praised and admired by others as well as to gain power. We sought to understand if these desires can be evoked during a prosocial donation task. This study manipulated whether the participants would be hypothetically receiving potential admiration after donating money. From that, we were able to see if that need to gain affection from others increased the likelihood of engaging in prosocial behavior. The participants also completed several personality measures as well as a demographic questionnaire. Using SAS®, we performed correlations between psychopathy and donation task conditions, between narcissism and donation task conditions, and between Machiavellianism and donation task conditions, as well as, graphing psychopathic personality index scores and performing demographic summary statistics.


Hands-on Workshops

HW01 : Hands-on Introduction to SAS® and the ODS Excel® Destination
Kirk Paul Lafler, Software Intelligence Corporation

SAS software is the gold standard for robust and reliable data access, manipulation, analytics, analysis, reporting and data discovery. Microsoft Excel is the most widely used software in the world. This hands-on workshop (HOW) demonstrates the various ways to transfer data, output and results between SAS and Excel software by presenting the most popular paths for connecting and sharing data and results between these software products.


HW02 : Using a Few SAS Functions to Clean Dirty Data
Ben Cochran, The Bedford Group

Manipulating Data can be a big part of what SAS programmers do. A big part of data manipulation is cleaning dirty data. SAS has a number functions that can be used to make this task a little easier. This HOW looks at many of them. Cleaning Data and making Data more consistent is the aim of this HOW.


HW03 : Base SAS® and SAS® Enterprise Guide® ~ Automate Your SAS World with Dynamic Code; Your Newest BFF (Best Friend Forever) in SAS
Kent Phelps, Illuminator Coaching, Inc.
Ronda Phelps, Illuminator Coaching, Inc.

Communication is the basic foundation of all relationships including our SAS relationship with the Server, PC, or Mainframe. To communicate more efficiently ~ and to increasingly automate your SAS World ~ you will want to learn how to transform Static Code into Dynamic Code that automatically recreates the Static Code, and then executes the recreated Static Code automatically. Our Hands-On-Workshop/presentation highlights the powerful partnership which occurs when Dynamic Code is creatively combined with a Dynamic FILENAME Statement, Macro Variables, the INDSNAME SET Option, and the CALL EXECUTE Command within 1 SAS Enterprise Guide Base SAS Program Node. You will have the exciting opportunity to learn how 1,469 time-consuming Manual Steps are amazingly replaced with only 1 time-saving Dynamic Automated Step. We invite you to attend our session where we will detail the UNIX syntax for our project example and introduce you to your newest BFF (Best Friend Forever) in SAS. Please see the Appendices to review starting point information regarding the syntax for Windows and z/OS, and to review the source code that created the data sets for our project example.


HW04 : Hands-On with an Excel-Based Code Playground for Creating and Sharing SAS ODS Graphics
Ted Conway, Self

You've heard that SAS ODS Graphics provide a powerful and detailed syntax for creating custom graphs, but for whatever reason still haven't added it to your bag of SAS tricks. Let's change that! Workshop participants will quickly gain experience creating a variety of charts by using an Excel-based code "playground" to submit SAS code examples and view the results directly from Excel. More experienced users will also find the code playground useful for compiling SAS ODS Graphics code snippets for themselves and to share with colleagues, as well as for creating Excel-hosted dashboards containing precisely sized and placed SAS graphics. This workshop is intended for all SAS users, and will use Base SAS, SAS Studio, SAS University Edition, and Microsoft Excel (no SAS Studio or SAS University Edition is needed).


HW05 : Intermediate SAS® ODS Graphics
Chuck Kincaid, Experis Business Analytics

This paper will build on the knowledge gained in the Intro to SAS® ODS Graphics. The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, a selection of the many wonderful capabilities was selected. This paper will look at that selection of both types of capabilities and provide the reader with more tools for their belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We will explore tables, annotation and changing attributes, as well as the BLOCK plot. Any user of Base SAS on any platform will find great value from the SAS ODS Graphics procedures. Some experience with these procedures is assumed, but not required.


Pharmaceutical Applications

PH01 : Challenges facing in Mapping MRI data to SDTM and ADaM
Lingling Xie, Eli Lilly and Company
Xiaoqi Li, Eli Lilly and Company

Our studies collect Magnetic Resonance Imaging (MRI) data for spine and sacroiliac joints at multiple time points. The raw data is huge and contains more than 600 records from each of the two readers and a possible adjudicator, at each time point. In SDTM, we manage to map the data to XP domain. In ADaM, we map it to a BDS structure that contains more than 600 parameters combining information from test location, structure signal, laterality and slice, with additional five parameters for total scores derived based on five complicated scoring algorithms including how to handle missing data and consolidation between different readers. We utilize programming to make the specification writing and dataset programming more efficient and also preventing manual typing errors. It is a challenging task and we finally work out the SDTM and ADaM datasets that suit our analysis need.


PH02 : ISO 8601 and SAS®: A Practical Approach
Derek Morgan, PAREXEL

The ISO 8601 standard for dates and times has long been adopted by regulatory agencies around the world for clinical data. While there are many homemade solutions for working in this standard, SAS has many built-in solutions, from formats and informats that even take care of time zone specification, to the IS8601_CONVERT routine, which painlessly handles durations and intervals. These built-in capabilities, available in SAS 9.2 and above, will streamline your code and improve efficiency and accuracy. This paper also assumes the use of SAS® version 9.2 and above.


PH03 : PATIENT SAFETY AND DISEASE PROFILING
Patti Radke-Connell, Froedtert and the Medical College of Wisconsin

Quality of care is the focus of healthcare providers. New government regulations focus on metrics to guide quality. The tools that are available in AHRQ allow the healthcare providers to evaluate patient safety indicators, profile patients based on their diagnosis and procedure codes into disease classifications. There are many challenges for hospitals in the implementation of quality of care programs and this paper will focus on the ways that data and data profiling can provide decision making capabilities for providers. The paper will discuss the many challenges with quality metrics, regulations, and discuss the value of quality programs and how using SAS BASE can enhance capabilities.


PH04 : AIR Binder 2.0: A Dynamic Visualization, Data Analysis and Reporting SAS Application for Preclinical and Clinical ADME Assays, Pharmacokinetics, Metabolite Profiling and Identification
Hao Sun, Covance, Inc.
Kristen Cardinal, Covance, Inc.
Richard Voorman, Covance, Inc.

Although regulatory agencies request pharmacometric datasets be submitted as SAS transport files for review, current clinical and preclinical ADME data analysis is handled mostly by non-SAS programs. Previously, we reported a SAS-based application, AIR Binder, for automatic analysis and reporting of a specific cytochrome P450 (CYP) inhibition assay, a key preclinical drug metabolism assay for the prediction of drug-drug interactions (PharmaSUG 2017). Thanks to the significantly improved productivity and efficiency, we expanded the application to a wide range of preclinical and clinical ADME assays for dynamic visualization, data analysis and reporting. Considering the complexity of data structures and presentation styles across these assays, SAS macros were designed and written to be more generalized and object-oriented. Key features include: various styles of ODS panel plots implemented to visualize profiles of metabolites for cross-species comparison and toxicology species selection; enhanced pharmacokinetic parameter analysis and display for metabolites; comprehensive statistical analysis of plasma protein binding data with PROC GLM; customized non-linear fitting for CYP inhibition and induction assays with kinetic parameter calculation and display using PROC NLIN. With the current infrastructure it is convenient to expand the program with the integration of new drug metabolism assay types for data analysis and visualization. Overall, AIR Binder 2.0 dynamically visualized data to efficiently convey information for quick decision making, which enhanced communications within study teams, between CRO and clients, and significantly shortened reporting turnaround time of drug metabolism projects for drug discovery and development.


PH05 : Automated Validation of Complex Clinical Trials Made Easy
Richann Watson, Experis
Josh Horstman, Nested Loop Consulting

Validation of analysis datasets and statistical outputs (tables, listings, and figures) for clinical trials is frequently performed by double programming. Part of the validation process involves comparing the results of the two programming efforts. COMPARE procedure output must be carefully reviewed for various problems, some of which can be fairly subtle. In addition, the program logs must be scanned for various errors, warnings, notes, and other information that might render the results suspect. All of this must be performed repeatedly each time the data is refreshed or a specification is changed. In this paper, we describe a complete, end-to-end, automated approach to the entire process that can improve both efficiency and effectiveness.


PH06 : ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Analysis Datasets
Nancy Brucken, InVentiv Health Clinical
Karin Lapann, Shire

Questionnaires, ratings and scales (QRS) are frequently used as primary and secondary analysis endpoints in clinical trials. The Submission Data Standards (SDS) QRS sub-team has compiled a considerable library of SDTM supplements defining standards for the collection and storage of QRS data. The ADaM ADQRS sub-team has been formed to develop addenda to these supplements, which will define standards for corresponding analysis datasets. This paper represents the current thinking of the ADQRS sub-team regarding basic principles for building QRS analysis datasets.


PH07 : Standard Two-Formulation Bioequivalence Testing
Keith Dunnigan, QuintilesIMS

One of the more common clinical trials encountered in the pharmaceutical industry is bioequivalence testing, whereby two formulations of the same active ingredient are compared as to their pharmacokinetics (time profile of drug concentration levels). Common examples include generic drug testing, as well as comparing different dose forms (for instance tablet versus capsule) in early clinical development. This presentation will serve as a first introduction and example as to the design, mathematics and SAS programming commonly used in the standard two formulation bioequivalence experiment.


Rapid Fire

RF01 : Ignorance is not bliss - understanding SAS applications and product contents
Jayanth Iyengar, Data Systems Consultants LLC

Have you ever heard 'SAS Display Manager' in a presentation or discussion, and wondered to yourself, what exactly is SAS Display Manager? Or have you come across 'SAS/SQL' or 'SAS/Macros' in a job description and thought that SQL or Macros are separate package modules of the SAS System. There's a fair amount of confusion and misinformation regarding SAS products, and what they're composed of, even amongst experienced SAS users. In this paper, I attempt to provide a proper understanding of SAS components, and distinguish between SAS applications, and SAS modules. Also, I show how to determine what products are licensed and installed in the SAS windowing environment.


RF02 : Quotes within Quotes: When Single (') and Double () Quotes are not Enough
Art Carpenter, CA Occidental Consultants

Although it does not happen every day, it is not unusual to need to place a quoted string within another quoted string. Fortunately SAS® recognizes both single and double quote marks and either can be used within the other. This gives us the ability to have two deep quoting. There are situations, however where two kinds of quotes are not enough. Sometimes we need a third layer or more commonly we need to use a macro variable within the layers of quotes. Macro variables can be especially problematic as they will generally not resolve when they are inside single quotes. However this is SAS, and that implies that there are several things going on at once and that there are several ways to solve these types of quoting problems. The primary goal of this paper is to assist the programmer with solutions to the quotes within quotes problem with special emphasis on the presence of macro variables. The various techniques are contrasted as are the likely situations that call for these types of solutions. A secondary goal is to help the reader understand how SAS works with quote marks and how it handles quoted strings. Although we will not go into the gory details, a surface understanding can be useful in a number of situations.


RF03 : No News Is Good News: A Smart Way to Impute Missing Clinical Trial Lab Data
Ming Yan, Eli Lilly

In clinical trials, specific lab microscopic UA, RBC morphology, and special WBC subordinate tests are reported to sponsors by a lab ONLY if an abnormality is observed. The normal results which are not explicitly reported, however, need to be in place in order to compute a percentage of abnormalities for each lab test in the subject population. This macro starts from SDTM LB domain that only stores observed abnormal lab tests and converts it to ADaM with all the normal test results filled out for all patients at all time-point.


RF05 : Macro that can Provide More Information for your Character Variables
Ting Sa, Cincinnati Children's Hospital Medical Center

Sometimes, we want to change character variables to numerical variables in batches. But before doing that, we may need to manually check if those variables contain all the numerical values that can be transformed to numeric type. Or sometimes, we want to make all the dates or datetimes be consistent among the SAS data sets, but if those variables are saved as character variables and we don't have a data dictionary, we will have to manually check the data sets. Also sometimes, we may have those character variables that just contain missing values and we want to delete them to save space. Using the macro in this paper, you can get those information for each character variable without manually checking one by one. The macro can check all the character variables in a library or in some data sets. An html report and excel report will be generated after running the macro to include the information about each checked character variable. Users can use the excel report to further filter the information. Based on the information, the user can decide the things they want to do for the character variables, like if the character variables contain all numerical values, they can be transformed to a numeric variable, if the character variables contain missing values, they can delete those variables etc.


RF06 : Cleaning Messy Data: SAS Techniques to Homogenize Tax Payment Data
Aaron Barker, Iowa Department of Revenue

One challenge frequently encountered by analysts aggregating over large volumes of data is dealing with inconsistencies among data sources. Fortunately, all versions of SAS have a wide array of tools available which a user of any skill level can employ to remedy such problems. This paper uses example tax payment data to demonstrate tools which have been employed to do a novel analysis of citizens' experience with the tax system in Iowa. The first method highlighted is conditional logic on identifier variables to determine how best to interpret the data in the identifier column. This is shown both using IF THEN logic and then again using SELECT and WHEN statements. The second method is determining when identifiers are of different types (e.g. permit numbers vs. Social Security Numbers) and then bringing in external data sources to recode inconsistent identifiers into consistent ones. This is shown using both PROC SQL as well as the traditional MERGE statement. The end result of these procedures is a data set that allows the analyst to see how many times a given individual or business interacts with the Department and provides valuable insight into the preferred payment method of taxpayers by tax type, payment type, and frequency of payments.


RF07 : PROC DOC III: Self-generating Codebooks Using SAS®
Louise Hadden, Abt Associates Inc.

This paper will demonstrate how to use good documentation practices and SAS® to easily produce attractive, camera-ready data codebooks (and accompanying materials such as label statements, format assignment statements, etc.) Four primary steps in the codebook production process will be explored: use of SAS metadata to produce a master documentation spreadsheet for a file; review and modification of the master documentation spreadsheet; import and manipulation of the metadata in the master documentation spreadsheet to self-generate code to be included to generate a codebook; and use of the documentation metadata to self-generate other helpful code such as label statements. Full code for the example shown (using the SASHELP.HEART data base) will be provided upon request.


RF08 : What Are Occurrence Flags Good For Anyway?
Nancy Brucken, InVentiv Health Clinical

The ADaM Structure for Occurrence Data (OCCDS) includes a series of permissible variables known as occurrence flags. These are optional Y/null flags indicating the first occurrence of a particular type of record within a subject. This paper shows how occurrence flags can be used with PROC SQL to easily produce tables summarizing adverse events (AEs) by System Order Class (SOC) and dictionary preferred term.


SAS 101

SA01 : An Introduction to PROC REPORT
Kirk Paul Lafler, Software Intelligence Corporation

SAS® users often need to create and deliver quality custom reports and specialized output for management, end users, and customers. The SAS System provides users with the REPORT PROCedure, a canned Base-SAS procedure, for producing quick and formatted detail and summary results. This presentation is designed for users who have no formal experience working with the REPORT procedure. Attendees learn the basic PROC REPORT syntax using the COLUMN, DEFINE, other optional statements, and procedure options to produce quality output; explore basic syntax to produce basic reports; compute subtotals and totals at the end of a report using a COMPUTE Block; calculate percentages; produce statistics for an analysis variables; apply conditional logic to control summary output rows; and enhance the appearance of output results with basic Output Delivery System (ODS) techniques.


SA02 : If you need these OBS and these VARS, then drop IF, and keep WHERE
Jayanth Iyengar, Data Systems Consultants LLC

Reading data effectively in the DATA step requires knowing the implications of various methods, and DATA step mechanics; The Observation Loop, and the PDV. The impact is especially pronounced when working with large data sets. Individual techniques for subsetting data have varying levels of efficiency and implications for input/output time. Use of the WHERE statement/option to subset observations consumes less resources than the subsetting IF statement. Also, use of DROP and KEEP to select variables to include/exclude can be efficient depending on how they're used.


SA03 : The Essentials of SAS® Dates and Times
Derek Morgan, PAREXEL

The first thing you need to know is that SAS® stores dates and times as numbers. However, this is not the only thing that you need to know, and this presentation will give you a solid base for working with dates and times in SAS. It will also introduce you to functions and features that will enable you to manipulate your dates and times with surprising flexibility. This paper will also show you some of the possible pitfalls with dates (and times and datetimes) in your SAS code, and how to avoid them. We'll show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats, how to use dates and times in TITLE and/or FOOTNOTE statements, and close with a brief discussion of Excel conversions.


SA04 : PROC SORT (then and) NOW
Derek Morgan, PAREXEL

With the advent of big data, faster sorting methods have reduced the use of the old staple, PROC SORT. This paper brings some of the useful features added to PROC SORT to light; it's not as much of a dinosaur as you might think.


SA05 : Working with Datetime Variable from Stata
Haiyin Liu, University of Michigan
Wei Ai, University of Michigan

Most SAS® users have to transfer Stata data files into SAS frequently. However, you must be careful when converting Stata Datetime to SAS Datetime. In this paper, we examine how SAS or Stata stores Datetime variables differently. We propose a correction function to accurately transfer Stata Datetime into SAS.


SA06 : Merge with Caution: How to Avoid Common Problems when Combining SAS Datasets
Josh Horstman, Nested Loop Consulting

Although merging is one of the most frequently performed operations when manipulating SAS datasets, there are many problems which can occur, some of which can be rather subtle. This paper illustrates common merge issues using examples. We examine what went wrong by walking step-by-step through the execution of each example. We look under the hood at the internal workings of the DATA step and the program data vector (PDV) to understand exactly what is going wrong and how to fix it. Finally, we discuss best coding practices to avoid these problems in the first place.


SA07 : Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS® Code
Josh Horstman, Nested Loop Consulting

Nearly every SAS® program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF&THEN&ELSE syntax. In this paper, we will explore various ways to construct conditional SAS logic, including some that may provide advantages over the IF statement. Topics will include the SELECT statement, the IFC and IFN functions, the CHOOSE and WHICH families of functions, as well as some more esoteric methods. We'll also make sure we understand the difference between a regular IF and the %IF macro statement.


SA09 : Parsing Useful Data Out of Unusual Formats Using SAS®
Andrew Kuligowski, HSN

Most Introduction to Programming courses will include a section on reading external data; the first assumption they make will be that the data are stored in some sort of documented and consistent format. Fortunately, in the real world, a lot of the data we deal with has the same basic assumption of occurring in a documented, consistent format - a lot of it, but not all of it. This presentation will address some techniques that can be used when we are not dealing with cleanly formatted data, when the data we want is in a less-than-ideal format, perhaps intermingled or seemingly buried with unnecessary clutter. It will discuss the principles of using SAS® to parse a file to extract useful data from a normally unusable source. This will be accomplished by citing examples of unusual data sources and the SAS Code used to parse it


SA10 : The Building Blocks of SAS® Datasets - S-M-U (Set, Merge, and Update)
Andrew Kuligowski, HSN

S-M-U. Some people will see these three letters and immediately think of the abbreviation for a private university and associated football team in Texas. Others might treat them as a three-letter word, and recall a whimsical cartoon character created by Al Capp many years ago. However, in the world of the SAS® user, these three letters represent the building blocks for processing SAS datasets through the SAS DATA step. S, M, and U are first letters in the words SET, MERGE, and UPDATE - the 3 commands used to introduce SAS data into a DATA step. This presentation will discuss the syntax for the SET, MERGE, and UPDATE commands. It will compare and contrast these 3 commands. Finally, it will provide appropriate uses for each command, along with basic examples that will illustrate the main points of the presentation.


SA11 : Before You Get Started: A Macro Language Preview in Three Parts
Art Carpenter, CA Occidental Consultants

Using the macro language adds a layer of complexity to SAS® programming that many programmers are reluctant to tackle. The macro language is not intuitive, and some of its syntax and logic runs counter to similar operations in the DATA step. This makes the transfer of DATA step and PROC step knowledge difficult when first learning the macro language. So why should one make the effort to learn a complex counterintuitive language? Before you start to learn the syntax; where to put the semicolon, and how to use the ampersand and percent sign; you need to have a basic understanding of why you want to learn the language in the first place. It will also help if you know a bit about how the language thinks. This overview provides the background that will enable you to understand the way that the macro language operates. This will allow you to avoid some of the common mistakes made by novice macro language programmers. First things first - before you get started with the learning process, you should understand these basic concepts.


SA12 : Writing Code With Your Data: Basics of Data-Driven Programming Techniques
Joe Matise, NORC

In this paper aimed at SAS® programmers who have limited experience with data step programming, we discuss the basics of Data-Driven Programming, first by defining Data-Driven Programming, and then by showing several easy to learn techniques to get a novice or intermediate programmer started using Data-Driven Programming in their own work. We discuss using PROC SQL SELECT INTO to push information into macro variables; PROC CONTENTS and the dictionary tables to query metadata; using an external file to drive logic; and generating and applying formats and labels automatically. Prior to reading this paper, programmers should be familiar with the basics of the data step; should be able to import data from external files; basic understanding of formats and variable labels; and should be aware of both what a macro variable is and what a macro is. Knowledge of macro programming is not a prerequisite for understanding this paper's concepts.


SA13 : Make That Report Look Great Using the Versatile PROC TABULATE
Ben Cochran, The Bedford Group

Several years ago, one of my clients was in the business of selling reports to hospitals. He used PROC TABULATE to generate part of these reports. He loved the way this procedure 'crunched the numbers', but not the way the final reports looked. He said he would go broke if he had to sell naked PROC TABULATE output. So, he wrote his own routine to take TABULATE output and render it through Crystal Reports. That was before SAS came out with the Output Delivery System (ODS). Once he got his hands on SAS ODS, he kissed his Crystal Reports license good-bye. This paper is all about using PROC TABULATE along with ODS to generate fantastic reports. If you want to generate BIG money reports with PROC TABULATE, this presentation is for you.


Statistics / Advanced Analytics

AA01 : Using SAS to Compare Two Estimation Methods on the Same Outcome: Example from First Trimester Pregnancy Weights
Brandy Sinco, University of Michigan
Edith Kieffer, University of Michigan
Kathleen Welch, University of Michigan
Diana Welmerink Bolton, University of Michigan

Background. Accurate pre-pregnancy weight is important for weight gain recommendations during pregnancy. Weight gain is linear during first trimester. Objective. Use a linear mixed model (LMM), with random slope and intercept, to predict weight at the end of the first trimester (week 13), for 276 women, from a study in which many pre-pregnancy weights were self-reported and likely inaccurate. Compare the predicted weights at week 13 from the LMM to weights computed by adding a constant per week for 13 weeks. Methods. For a sub-sample in which the weights at week 13 were known, error variances between predicted and self-reported weights were compared with a Proc Mixed random effects model and then by using Proc IML to conduct a likelihood ratio test for a variance comparison. Graphically, Proc SGPlot produced box plots and histograms to graphically display the variances between the two methods. Next, indicators were created for weight categories (under-weight, normal, over-weight, obese) and excessive weight gain. Accuracy of categories can be compared with Proc Freq by comparing the 95% confidence intervals for Cohen's kappas. Error rates can be compared with Proc Freq by using the McNemar test to compare error rates between the projected categories by the two prediction methods. Further, Proc Logistic can be used to evaluate accuracy by comparing the areas under the ROC curves between models using predicted and self-reported pre-pregancy weights. Results. The likelihood ratio test, kappa confidence interval, and McNemar's test indicated that weight prediction from the LMM had lower variance and error rates.


AA02 : Logistic Model Selection with SAS® PROC's LOGISTIC, HPLOGISTIC, HPGENSELECT
Bruce Lund, Independent Consultant

In marketing or credit risk a model with binary target is often fit with logistic regression. In this setting the sample size is very large while the number of predictors may be approximately 100. But many of these predictors may be classification (CLASS) variables. This paper discusses the variable selection procedures that are offered by PROC LOGISTIC, PROC HPLOGISTIC, and PROC HPGENSELECT. These selection procedures include the best subsets approach of PROC LOGISTIC, section by best SBC of PROC HPLOGISTIC, and selection by LASSO of PROC HPGENSELECT. The use of classification variables in connection with these selection procedures is discussed. Simulations are run to compare these methods on predictive accuracy and the handling of extreme multicollinearity.


AA04 : Claim Analytics
Mei Najim, Gallagher Bassett Services, Inc.

Claim Analytics has been evolving in insurance industry for the past two decades. My paper is tentatively organized to be in four parts as follows: 1. This presentation will first provide an overview of Claim Analytics. 2. Then, a common high-level claim analytics technical process with large data sets will be introduced. The steps of this process include data acquisition, data preparation, variable creation, variable selection, model building (a.k.a.: model fitting), model validation, and model testing, etc. 3. A Case Study: Over the past couple of decades, in the propensity & casualty insurance industry, around 20% closed claims have settled with litigation propensity representing 70-80% of total dollars paid. Apparently, the litigation is one of the main claim severity drivers. In this case study, we are going to introducing Worker's Compensation (WC) Litigation Propensity Predictive Model at Gallagher Bassett which is designed to score the open WC claims to predict the open claims litigation propensity in the future. The data including WC Book of Business over a few thousands of clients' data with millions of claims and thousands of variables has been used and explored to build the model. Multiple cutting-edge statistical and machine learning techniques (GLM Logistic Regression, Decision Tree, Neural Network, Gradient Boosting, etc.) along with WC business knowledge are utilized to discover and derive complex trends and patterns across the WC book of business data to build the model. 4. Conclusion


AA05 : Unconventional Statistical Models with the NLMIXED Procedure
Robin High, University of Nebraska Medical Center

SAS®/Stat and SAS/ETS software have several procedures which estimate parameters from generalized linear models for a variety of continuous and discrete distributions. The GENMOD, COUNTREG, GLIMMIX, LIFEREG, and FMM procedures, among others, offer a flexible range of unconventional types of data analysis options, including zero-inflated, truncated, and censored response data. The COUNTREG procedure also includes the Conway-Maxwell Poisson distribution and the negative binomial with two variance functions as choices. The FMM procedure includes the generalized Poisson distribution as well as the ability to work with several truncated and zero-inflated distributions for both discrete and continuous data. This paper demonstrates how the NLMIXED procedure can be utilized to duplicate their results in order to first gain insight into the complex computational details and also, the capability to enter programming statements into NLMIXED can then be expanded to work with data from even more unconventional data analysis situations.


AA06 : Multiple Imputation of Family Income Data in the 2015 Behavioral Risk Factor Surveillance System
Jia Li, NIOSH
Aaron Sussell, NIOSH

Multiple imputation methods are increasingly used to handle missing data in statistical analyses of observational studies to reduce bias and improve precision. SAS/STAT® PROC MI can be used to impute continuous or categorical variables with a monotone or arbitrary missing pattern. This study used the fully conditional specification (FCS) method to impute the family income variable in the 2015 Behavioral Risk Factor Surveillance System (BRFSS) data. BRFSS is a health survey that collects state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. In this paper, the study population was restricted to currently employed respondents (age>=18) from the 25 states that collected industry and occupation information. Of the total 87,483 respondents, 11% were missing income information. To impute the missing income data, all variables in the survey that are correlated with either income or missingness of income (N=28) were selected as covariates. BRFSS sample design variables that represent stratification and unequal sampling probabilities were also included in the imputation model to improve validity. The FCS method was chosen due to an arbitrary missing pattern and mixed data types among income and all covariates. Logistic regression and discriminant function options were used for imputing binary and ordinal/nominal variables respectively. Results show a significantly different distribution in imputed income values compared to the observed values, suggesting that using the traditional complete case analysis approach to analyze BRFSS income data may lead to biased results.


AA07 : GREMOVE, Reassign, and let's GMAP! A SAS Trick for Generating Contiguous Map Boundaries for Market-Level Research
Chad Cogan, Arbor Research Collaborative for Health
Jeffrey Pearson, Arbor Research Collaborative for Health
Purna Mukhopadhyay, Arbor Research Collaborative for Health
Charles Gaber, Arbor Research Collaborative for Health
Marc Turenne, Arbor Research Collaborative for Health

In health services research, accurate health care market definitions are crucial for assessing the potential market-level consequences of policy changes. Political units of geography (e.g. counties) are generally not sufficient for capturing the service area of a provider. Alternatively, researchers may generate customized boundaries using data-driven approaches based on patient flow only to find that their newly defined areas are not contiguous. We use a novel approach to correct for the lack of contiguity using the information produced by the GREMOVE procedure. GREMOVE is often used along with the GMAP procedure when there is a need to generate customized boundaries on a map by removing the internal boundaries of smaller units of geography. However, SAS users may not be aware of the logic used by PROC GREMOVE to assign segment values and the underlying data that goes into the maps. We first examine the logic used by PROC GREMOVE, and the map output dataset it produces. We identify some potential limitations of GREMOVE along with some alternative uses, which we demonstrate using basic polygons. We then look at customized map boundaries produced using a data-driven approach to combine zip code tabulation areas (ZCTAs) based on patient flow and show how GREMOVE identifies non-contiguous segments in a newly defined area. We then use a SAS trick to modify the GREMOVE logic for segment assignment, and generate new contiguous boundaries.


AA08 : Correcting for Selection Bias in a Clinical Trial
Shana Kelly, Spectrum Health

Selection bias occurs when data does not represent the population intended, and balance across all potential confounding factors based on randomization did not happen. Selection bias can cause misleading results when doing statistical analysis, and should be corrected for. This paper explores a few alternative techniques to correct for a disparity between the various comparison groups in a clinical trial. Food Prescription is a small clinical trial conducted by Spectrum Health to get impoverished individuals in the Grand Rapids community with a chronic disease such as diabetes to consume more fresh fruits and vegetables. Health outcomes are compared between the treatment and control groups after taking into account all covariates. The procedures shown are produced using SAS® Enterprise Guide 7.1.


AA11 : Dimensionality Reduction using Hadamard, Discrete Cosine and Discrete Fourier Transforms in SAS
Mohsen Asghari, Computer Engineering and Computer Science Department, University Of louisville
Aliasghar Shahrjooihaghighi, Computer Engineering and Computer Science Department, University Of louisville
Ahmad Desoky, Computer Engineering and Computer Science Department, University Of louisville

Dimensionality reduction studies various techniques to transform data in the most compact and efficient manner that allows modeling, analyzing, and predicting information with insignificant errors. Principle component analysis (PCA) is a method for reducing the dimensionality by decreasing the number of variables and selecting a smaller subset of uncorrelated transformed variables called principal components. PCA is data dependent and requires the computation of the correlation matrix of input data as well as the Singular Value Decomposition (SVD) of that matrix. Hadamard, Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) are orthogonal transformations that are not data dependent and reduce the dimensionality by decreasing the correlation of the transform components. In this paper, we implemented Hadamard, DCT, and DFT in SAS on a standard dataset. Also, we compared the results of these transformations and PCA technique.


AA12 : Nothing to SNF At: Evaluating an intervention to reduce skilled nursing home (SNF) length of stay
Andrea Frazier, Presence Health

Length of stay (LOS) in skilled nursing facilities (SNF, pronounced sniff) is a driver of high health care costs, particularly for Medicare patients. This study used survival analysis techniques to examine the effect of a simple intervention (educating providers and case managers on expected LOS for each patient) on SNF LOS. We'll also discuss techniques used to abate particular data collection challenges in this study.


AA13 : How Can an NBA Player Be Clutch?: A Logistic Regression Analysis
Logan Edmonds, Oklahoma State University

Many NBA players are known as clutch shooters that put fear in the opposing team as the clock is winding down. Michael Jordan is remembered as much for his game ending shots as the high-flying dunks. We know that these players can make the shot, but are there certain situations that contribute to the likelihood the game winner will go in? Using PROC LOGISTIC, this paper determines the key components of made shots during the crucial last two minutes of an NBA game. Using shot log data from the 2014-2015 NBA season, over 120,000 shots were filtered to those occurring in the last two minutes of regulation or overtime. Many things can effect whether a player will make these high pressure shots. Specifically, the effects of home court, back-to-back game fatigue, shot distance and dribble time before the shot are considered as possible predictors of clutch shooting. Assessment of only final game shots includes discussion of how teams could potentially use these variables in designing end of game plays. Finally, this analysis seeks a quantitative analysis of how players considered by pundits as 'clutch' shooters live up to their billing.


AA14 : Multicollinearity: What Is It and What Can We Do About It?
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper will review and provide examples of the different ways in which the multicollinearity can affect a research project, how to detect multicollinearity, and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper will explore the proposed techniques through utilization of the Behavioral Risk Factor Surveillance System dataset. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in behavioral science and/or statistics.


AA15 : OSCARS 2017 - TEXT MINING & SENTIMENTAL ANALYSIS
Karthik Sripathi, Oklahoma State University

OSCARS 2017 - TEXT MINING & SENTIMENTAL ANALYSIS Author: Karthik Sripathi, This email address is being protected from spambots. You need JavaScript enabled to view it. University of Scholar: Oklahoma State University, Stillwater, Oklahoma, USA Mentor(s): Dr. Goutam Chakraborthy, Dr. Miriam McGaugh Abstract It has always been fascinating to realize how the magnitude of award shows have been increasing year after year. It is the enormously positive response of audience that keeps the stage shows to envisage. We know that sentiments of people play a crucial role in deciding the prospects of a particular event. This paper summarizes the sentiments of individuals towards one of the most awards popular show, OSCARS. It provides crucial insights on how people sentiments could determine the success or failure of a show. The paper involves text mining of people's reactions towards the 2017 Oscars in general and a sentiment analysis of regarding the best picture mix up using SAS® Sentiment Analysis Studio. Social media has evolved as a platform where we can directly evaluate people's liking or disliking to an event. Understanding their opinions on a social media platform will open us to an unbiased environment. There are no filters to the way people react to an event and the information that we can tap in to from such a platform gives us different perspectives. By analyzing this information, improvements can be suggested for the event in the future. We also get a sense of how people react to an unexpected event at large award show productions. This paper aims to determine the success of an awards show based on individual sentiments before the show, during and after the show. This information will give a better picture of how to handle any unwanted circumstances during the event. We can conclude from the 2017 Oscars that the sentiments of the people were more positive or neutral indicating that the excitement about the show will over shadow any unwanted events. This analysis can be extended to build a text predictive model wherein there is a scope of predicting the sentiments towards unwanted events and will help us to set the stage better and be prepared for potential problems.


AA16 : Text and Sentiment Analysis of customer tweets of Nokia using SAS® Enterprise Miner and SAS® Sentiment Analysis Studio
Vaibhav Vanamala, Oklahoma State University

The launch of new Nokia phones has produced some significant and trending news throughout the globe. There has been a lot of hype and buzz going around the release of these Nokia phones in the mobile market at Mobile World Conference 2017. As a result, there has been a significant social media response after the launch of the Nokia phones. Social media provides a platform for millions of people to share or express their unbiased opinions. In this paper, my aim is to analyze the overall sentiment prevailing in the social media posts related to the release of Nokia phones. In order to achieve this, I have extracted real time data from twitter using google twitter API from 26th February ,2017 to 26th March ,2017 which resulted in 38000 of tweets and re-tweets. I have used SAS Enterprise Miner and SAS Sentimental Analysis Studio to evaluate key questions regarding the launch of Nokia phones such as understanding the needs and expectations of customers, perception of people about the launch of Nokia phones, how to increase the revenue of Nokia company by meeting customer expectations and targeted marketing. This paper helps Nokia manufacturers to improve the quality of their phones according to the expectations and needs of customers.


AA17 : Tornado Inflicted Damages Pattern
Vasudev Sharma, Oklahoma State University

On average, about a thousand tornadoes hit the United States every year. Three out of every four tornadoes in the world occur in the United States. They damage life and property in their path and they often hit with very little, sometimes no, warning. Tornadoes cause approximately 70 fatalities and 1,500 injuries in US every year. Once a tornado destroyed an entire motel in Oklahoma, and the motel's sign was later recovered in Arkansas. Tornadoes most frequently hit Tornado Alley which is mainly made up of Nebraska, South Dakota, Oklahoma, Texas, and Kansas. A tornado extends from a thunderstorm to the ground and appears as a funnel shaped cloud rotating with winds which can reach 300 miles per hour and can exceed a 1 mile radius. Tornadoes can travel very long distances making them very devastating. Since the ability to detect the intensity and direction of tornadoes prior to formation is limited, predicting the likelihood a tornado will form with accuracy can save many lives, as well as property. The purpose of the study is to find a pattern in the fatalities, injuries and property loss caused by tornadoes. The tools used are Base SAS, SAS Enterprise Miner, R, and Tableau. The results include statistical analysis, descriptive analysis, predictive analysis and visualizations from these tools.


AA18 : Agricultural Trip Generation - Linking Spatial Data and Travel Demand Modeling using SAS
Alan Dybing, North Dakota State University - Upper Great Plains Transportation Institute

Software Used: SAS 9.4 TS Level 1M2, X64_8PRO Platform on Windows 10 Pro Audience Level: The SAS techniques described in the paper can be replicated by users with general mastery of data step techniques. Data linkages from GIS and Cube Voyager require advanced knowledge of these softwares. The four-step travel demand modeling procedure (TDM) is commonly used to estimate and forecast traffic volumes for use in transportation planning. The trip generation step of the four-step model seeks to estimate the trip attractions and productions representing individual trips originating or terminating with a geographic boundary. Freight trip generation in rural areas within the great plains primarily result from agricultural production and marketing. This paper outlines a procedure for utilizing satellite imagery to estimate agricultural truck trip production at the township level linking ArcGIS and SAS to generate trip generation tables for use for TDM purposes. The National Agricultural Statistics Service (NASS) Cropland Data Layer (CDL) is a data sourced produced using a combination of satellite imagery, ground-truth surveys and data mining techniques resulting in a digital raster map providing land use type at a 30-meter (0.25 acre) resolution. At the township level, the raster data was converted to polygons and acreages by crop type were calculated using ArcMap and output to a shapefile. The database file was imported for SAS and using a combination of NASS county-level crop yield and fertilizer usage rate estimates, the total truck trips resulting from agricultural production activities were estimated by township. SAS was used to convert these estimate to a specific file format utilized by Cube Voyager for use in development of a truck TDM.


AA19-SAS : Getting Started with Multilevel Modeling
Mike Patetta, SAS

In this presentation you will learn the basics of working with nested data, such as students within classes, customers within households, or patients within clinics through the use of multilevel models. Multilevel models can accommodate correlation among nested units through random intercepts and slopes, and generalize easily to 2, 3, or more levels of nesting. These models represent a statistically efficient and powerful way to test your key hypotheses while accounting for the hierarchical nesting of the design. The GLIMMIX procedure is used to demonstrate analyses in SAS.


AA20-SAS : Power and Sample Size Computations
John Castelloe, SAS

Sample size determination and power computations are an important aspect of study planning; they help produce studies with useful results for minimum resources. Application areas are diverse, including clinical trials, marketing, and manufacturing. This tutorial presents numerous examples using the POWER and GLMPOWER procedures in SAS/STAT® software to illustrate the components of a successful power and sample size analysis. The practitioner must specify the design and planned data analysis and choose among strategies for postulating effects and variability. The examples cover proportion tests, t tests, confidence intervals, equivalence and noninferiority, survival analyses, logistic regression, and repeated measures. The logistic regression example demonstrates the new CUSTOM statement in the POWER procedure that supports extensions of power analyses involving the chi-square, F, t, normal, and correlation coefficient distributions. Attendees will learn how to compute power and sample size, perform sensitivity analyses for factors such as variability and Type I error rate, and produce customized tables and graphs using the POWER and GLMPOWER procedures and the %POWTABLE macro in SAS/STAT software.


AA99 : Tips and Best Practices Using SAS® in the Analytical Data Life Cycle
Tho Nguyen, Teradata
Paul Segal, Teradata

Come learn some tips and best practices with SAS/ACCESS, SAS formats, data quality, DS2, model development, model scoring, Hadoop and Visual Analytics - all integrated with the data warehouse.


System Architecture and Administration

SY01 : Using Agile Analytics for Data Discovery
Bob Matsey, Teradata

Companies are looking for an Agile/Self Service solutions to run their SAS Analytics without any delays from IT in a massively parallel Teradata database environment. They are looking for a seamless and open architecture for the business users that allows them to use whatever tools they would like to, while being able to manage and to load all their various types of data for discovery from many different environments. They need to ability to quickly explore, prototype, and test new theories while allowing them to succeed or fail fast all in a self-serve environment that does not depend on IT all the time. This session is intended for all skill levels and background.


SY02 : Read SAS Metadata Content in SAS Enterprise Guide
Piyush Singh, TCS
Prasoon Sangwan, TCS
Ghiyasudin Khan, TCS

So far, SAS® Management Console has been the primary tool to interact with SAS® Metadata Servers. It provides single interface for administrator to read/ update Metadata. Sometime users need much more than what SAS Management Console can provide. SAS Administrators/ users come across many situations where they need different information and SAS Management Console can't provide all these Metadata information as per user format. This paper contains few SAS macros, which can be used in SAS® Enterprise Guide/ PC SAS® session to read SAS® Metadata content. Macros given in this paper explain how these can be executed in SAS® Enterprise Guide and how to modify, to meet other business need. There may be some tools available in the market, to read SAS® Metadata but this paper helps in achieving most of them within SAS client like PC SAS and SAS Enterprise Guide.


Tools of the Trade

TT01 : Generating Reliable Population Rates Using SAS® Software
Jack Shoemaker, MDwise

The business of health insurance has always been to manage medical costs so that they don't exceed premium revenue. Monitoring and knowing about these patient populations will mean the difference between success and financial ruin. At the core of this monitoring are population rates like per member per month costs and utilization per thousand. This paper describes techniques using SAS® software that will generate these population rates for an arbitrary set of population dimensions. Keeping the denominators in sync with the numerators is key for implementing trustworthy drill-down applications involving population rates.


TT02 : Check Please: An Automated Approach to Log Checking
Richann Watson, Experis

In the pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable. These programs can be run individually in an interactive SAS® session, which enables us to review the logs as we execute the programs. We could run the individual programs in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, and so on. Both of these approaches are fine if there are only a handful of programs to execute. But what do you do if you have hundreds of programs that need to be re-run? Do you want to open every single one of the programs and search for unwanted messages? This manual approach could take hours and is prone to accidental oversight. This paper discusses a macro that searches a specified directory and checks either all the logs in the directory, only logs with a specific naming convention, or only the files listed. The macro then produces a report that lists all the files checked and indicates whether issues were found.


TT03 : Arbovirus, Varicella and More: Using SAS® for Reconciliation of Disease Counts
Misty Johnson, State of WI-DHS

Communicable disease surveillance by the Centers for Disease Control (CDC) is dependent upon reporting by public health jurisdictions in the United States. Communicable disease reporting in the State of Wisconsin is facilitated by an electronic surveillance system that communicates directly with the CDC. Disease reports are often updated with more test results and information; each addition of information to a disease case results in a new message generated by the surveillance system. The State of Wisconsin reconciles disease reporting between cases acknowledged by the program epidemiologist and disease reports sent to the CDC by the surveillance system on an annual basis. A SAS 9.4 program utilizing simple DATA steps with by-group processing is used to easily determine which reports were processed and counted by the CDC. PROC TABULATE, used in conjunction with the Output Delivery System Portable Document Format (ODS PDF) destination makes it easy to produce line-lists and simple case counts by disease that are used by the epidemiologist to verify their records and counts are complete and in agreement with the CDC. This paper is meant for all levels of SAS programmers and demonstrates basic coding techniques to perform simple data cleaning and validation, followed by removal of redundant reports and ending with the creation of five different pairs of output reports. Intermediate coding techniques include the use of macro variables to assign input and output file names and paths, Access to PC Files to import an Excel® file and printing output to file using the ODS PDF destination.


TT04 : Code Like It Matters: Writing Code That's Readable and Shareable
Paul Kaefer, UnitedHealthcare

Coming from a background in computer programming to the world of SAS yields interesting insights and revelations. There are many SAS programmers who are consultants or work individually, sometimes as the sole maintainer of their code. Since SAS code is designed for tasks like data processing and analytics, SAS developers working on teams may use different strategies for collaboration than those used in traditional software engineering. Whether a programmer works individually, on a team, or on a project basis (delivering code and moving on to the next project), there are a number of best practices that can be leveraged to improve their SAS code. These practices make it easier to read, maintain, and understand/remember why the code is written the way it is. This paper presents a number of best practices, with examples and suggestions for usage. The reader is encouraged not to apply all the suggestions at once, but to consider them and how they may improve their work or the dynamic of their team.


TT05 : Quality Assurance Strategies for Analytic Code
Matthew Nizol, ArborMetrix

Custom analytic code is increasingly used to answer hard questions and support mission-critical decisions. However, such decisions may be unsound if the software driving the analysis is faulty. As such, quality assurance (QA) must be a vital component of the development process within an analytic organization. For optimal efficacy, QA needs to be a holistic activity that informs all steps of the software development lifecycle, from requirements gathering through final validation. Given the complexity of most analytic code and the variety of tasks that analysts are called to perform, determining how best to verify and validate that code can be overwhelming. This paper provides an overview of basic verification and validation strategies that are useful for analytic code: static verification via self and peer code review; dynamic verification via smoke testing, output sampling, unit testing, and double programming; and validation via specification review, impact analysis, and benchmark comparison. Moreover, recognizing that not all QA strategies are necessary (or appropriate) for all analytic tasks, this paper proposes a scenario-based approach to quality assurance planning so that analysts can optimize the time they spend on QA.


TT06 : From Device Text Data to a Quality Dataset
Laurie Bishop, Cincinnati Children's Hospital Medical Center

Data quality in research is important. It may be necessary for data from a device to be used in a research project. Often it is read from an external text file and entered onto a CRF. Then the data is read from the CRF and entered it into a database. This process introduces many opportunities for data quality to be compromised. The quality of device data used in a study can be greatly improved if the data can be read directly from a device's output file directly into a dataset. If the device outputs results into a text file that can be saved electronically, SAS® can be used to read the data needed from the results and save the data directly into a dataset. In addition to improving data quality, data collection and monitoring time can also be reduced by taking advantage of these electronic files as opposed to recapturing this data on a CRF.


TT07 : Proc Transpose Cookbook
Doug Zirbel, Wells Fargo and Co.

Proc TRANSPOSE rearranges columns and rows of SAS datasets, but its documentation and behavior can be difficult to comprehend. For common input situations, this paper will show a variety of "what-you-have" and "what-you-want", plus code and an easy reference card.


TT08 : Get Smart! Eliminate Kaos and Stay in Control - Creating a Complex Directory Structure with the DLCREATEDIR Statement, SAS® Macro Language, and Control Tables
Louise Hadden, Abt Associates Inc.

An organized directory structure is an essential cornerstone of data analytic development. Those programmers who are involved in repetitive processing of any sort control their software and data quality with directory structures that can be easily replicated for different time periods, different drug trials, etc. Practitioners (including the author) often use folder and subfolder templates or shells to create identical complex folder structures for new date spans of data or projects, or use manual processing or external code submitted from within a SAS® process to run a series of MKDIR and CHDIR commands from a command prompt to create logical folders. Desired changes have to be made manually, offering opportunities for human error. Since the advent of the DLCREATEDIR system option in SAS version 9.3, practitioners can create single folders if they do not exist from within a SAS process. Troy Hughes describes a process using SAS macro language, the DLCREATEDIR option, and control tables to facilitate and document the logical folder creation process. This paper describes a technique wrapping another layer of macro processing which isolates and expands the recursive logical folder assignment process to create a complex, hierarchical folder structure used by the author for a project requiring monthly data intake, processing, quality control and delivery of thousands of files. Analysis of the prior month's folder structure to inform development of control tables is discussed.


TT09 : An array of possibilities: using arrays to manipulate longitudinal survey data
Lakhpreet Gill, Mathematica Policy Research

SAS arrays are extremely well-suited for handling longitudinal survey data, which are data collected at multiple time points for a sample population. Oftentimes, the study time period for observing individuals in the sample population may vary based on when respondents entered the study. The research question itself can also contribute to differences in the study time period. For this paper, the Health and Retirement Study is relied on to identify differing baseline and follow-up measurements as conditioned on age and response status. It demonstrates how to dynamically create an array index based on respondents entry and exit from the study, and how to implement the index in a variety of ways to extract needed information from the survey data. Specifically, programming topics covered are how to: Create an index based on respondent specific baseline and analysis waves Manipulate the index to select information for later waves Troubleshoot out of bounds cases Integrate the subscript with the index to populate time series variables Merge wide and long data to look ahead and look across This topic was presented as a powerpoint presentation to the Michigan SAS Users Group. It has since been used within Mathematica Policy Research for training research assistants who have been working for at least one year. The primary audience therefore is Intermediate SAS Users, as it assumes some basic knowledge of using arrays. SAS 9.4 in Enterprise Guide 7.1 was used.


TT10 : Fully Automated Updating of Arbitrarily Complex Excel Workbooks
David Oesper, Lands' End

You can generate some very sophisticated Excel workbooks using ODS EXCEL and ODS TAGSETS.EXCELXP, but sometimes you'll want to create your Excel workbook in Microsoft Excelor someone else will provide it to you. I'll show you how you can use SAS to dynamically update (and distribute) any existing Excel workbook with no manual intervention required. You'll need only Base SAS 9.4 TS1M2 or later and SAS/ACCESS to PC Files to use this approach. Examples in both the Linux and Windows operating environments will be presented.


e-Posters

PO01 : Red Rover, Red Rover, Send Data Right Over: Exploring External Geographic Data Sources with SAS®
Louise Hadden, Abt Associates Inc.

The intrepid Mars Rovers have inspired awe and Curiousity - and dreams of mapping Mars using SAS/GRAPH®. This presentation will demonstrate how to import SHP file data (using PROC MAPIMPORT) from sources other than SAS and GfK to produce useful (and sometimes creative) maps. Examples will include mapping neighborhoods, ZCTA5 areas, postal codes and of course, Mars. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.


PO02 : SAS/GRAPH® and GfK Maps: a Subject Matter Expert Winning Combination
Louise Hadden, Abt Associates Inc.

SAS® has an amazing arsenal of tools to use and display geographic information that is relatively unknown and underutilized. High quality GfK Geocoding maps have been provided by SAS since SAS 9.3 M2, as sources for inexpensive map data dried up. SAS has been including both GfK and "traditional" SAS map data sets with licenses for SAS/GRAPH for some time, recognizing there will need to be an extended transitional period. However, for those of us who have been putting off converting our SAS/GRAPH mapping programs to use the new GfK maps, the time has come, as the "traditional" SAS map data sets are no longer being updated. If you visit SAS MapsOnline, you will find only GfK maps in current maps. The GfK maps are updated once a year. This presentation will walk through the conversion of a long-standing SAS program to produce multiple US maps for a data compendium to take advantage of GfK maps. Products used are Base SAS® and SAS/GRAPH®. SAS programmers of any skill level will benefit from this presentation.


PO03 : Data Quality Control: Using High Performance Binning to Prevent Information Loss
Lakshmi Nirmala Bavirisetty, Independent SAS User
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

It is a well-known fact that the structure of real-world data is rarely complete and straight-forward. Keeping this in mind, we must also note that the quality, assumptions, and base state of the data we are working with has a very strong influence on the selection and structure of the statistical model chosen for analysis and/or data maintenance. If the structure and assumptions of the raw data are altered too much, then the integrity of the results as a whole are grossly compromised. The purpose of this paper is to provide programmers with a simple technique which will allow the aggregation of data without losing information. This technique will also check for the quality of binned categories in order to improve the performance of statistical modeling techniques. The SAS® high performance analytics procedure, HPBIN, gives us a basic idea of syntax as well as various methods, tips, and details on how to bin variables into comprehensible categories. We will also learn how to check whether these categories are reliable and realistic by reviewing the WOE (Weight of Evidence), and IV (Information Value) for the binned variables. This paper is intended for any level of SAS User interested in quality control and/or SAS high performance analytics procedures.


PO04 : Clinical and genetic biomarkers as moderators of fat gain in antipsychotic treated older adults
Hanadi Ajam Oughli, Washington University in Saint Louis
Dave Dixon, Washington University in Saint Louis, Department of Psychiatry
Mark Mueller, Washington University in Saint Louis, Department of Psychiatry
Ginger Nicol, Washington University in Saint Louis, Department of Psychiatry
Eric Lenze, Washington University in Saint Louis, Department of Psychiatry

Title: Using PROC FACTOR to Condense Individual Moderators Into A Single Combined Moderator Abstract: Factor analysis is a process in which a set of variables are summarized into a few dimensions, called latent variables or factors, by identifying correlations among the variables selected. The PROC FACTOR procedure, which is used for factor analysis, is more recently being utilized to create combined moderators that can be used in clinical research analyses to represent a weighted sum of a larger number of individual moderators. In this example, using data collected from a large NIMH funded, multi-site, randomized, clinical trial that examined the efficacy and tolerability of aripiprazole augmentation in treatment-resistant late-life depression, we created a combined moderator (Mc) from 26 individual moderators of total fat gain associated with aripiprazole augmentation. The 26 moderators included variables on patients' demographic characteristics, depression clinical characteristics, and cardiometabolic indices. The Spearman correlation effect size was calculated for each individual moderator. A factor analysis was then conducted to identify orthogonal latent moderators among the 26 moderators. SAS 9.4 was used to conduct the factor analysis, with varimax rotation. A total of 10 factors were obtained. A single moderator from each factor was then selected based on its largest effect size, factor loading, and clinical significance to represent that factor. These moderators were then used to create the combined clinical moderator (Mc); which ultimately had the largest effect size. We have outlined this methodology and present it here. We also discuss the potential of incorporating genetic information into the combined moderator, which can be used in the future for treatment decision making.


PO05 : Let's Get FREQy with our Statistics: Let SAS® Determine the Appropriate Test Statistic Based on Your Data
Lynn Mullins, PPD
Richann Watson, Experis

As programmers, we are often asked to program statistical analysis procedures to run against the data. Sometimes the specifications we are given by the statisticians will outline which statistical procedures to run. But other times, the statistical procedures to use need to be data dependent. To run these procedures based on the results of previous procedures' output requires a little more preplanning and programming. We will present a macro that will dynamically determine which statistical procedure to run based on previous procedure output, as well as, allowing the user to input parameters (such as primvar, secvar, chifsh, trend, trendier, and binci) and the macro will return counts, percents and appropriate p-value for Chi vs. Fisher and p-value for Trend and Binomial CI, if applicable.