Proceedings

Get Updates

Join our e-mail list:

MWSUG 2019 Paper Presentations

Paper presentations are the heart of a SAS users group meeting. MWSUG 2019 will feature dozens of paper presentations organized into several academic sections covering a variety of topics and experience levels.

Note: Content and schedule are subject to change. Last updated 03-Jul-2019.



Business Leadership

Paper No. Author(s) Paper Title (click for abstract)
BL-006 Kirk Paul Lafler Differentiate Yourself
BL-066 Richann Watson
& Louise Hadden
Are you Ready? Preparing and Planning to Make the Most of your Conference Experience
BL-068 Sreejita Biswas
& Miriam Mcgaugh
Taxi Ride Prediction: Does the Yellow Cab Supply Meet Customer Demands?
BL-074 Prajakta Pai
& Miriam Mcgaugh
Impact of Aging Population on Social Security and Underlying Trends
BL-076 Manasi Murde
& Miriam Mcgaugh
Why Admitted Students Opt-out of College Enrollment?
BL-101 Steven Myers Exploring and characterizing time series data in a non-regression based approach
BL-104 Steven Myers Explore your data before you rush to analysis, you will thank me later: explorations in cross section data


Hands On Workshop

Paper No. Author(s) Paper Title (click for abstract)
HW-007 Kirk Paul Lafler Powerful and "Sometimes" Hard-to-find PROC SQL Features
HW-020 Xiaoting Wu Survival Tips for Survival Analysis
HW-022 Jayanth Iyengar Understanding Administrative Healthcare Data sets using SAS programming tools
HW-037 Troy Hughes Parallel Processing Your Way to Faster Software and a Big Fat Bonus: Demonstrations in Base SAS®
HW-050 Josh Horstman Doing More with the SGPLOT Procedure
HW-063 Richann Watson
& Kriss Harris
Interactive Graphs
HW-086 Kent Phelps
& Ronda Phelps
Base SAS® & SAS® Enterprise Guide® Automate Your SAS® World with Dynamic Code ~ Forwards & Backwards
HW-094 Ben Cochran Not Even One Single Solitary Semicolon: Powerful SAS Things You Can Do Without Writing Programs


Industry

Paper No. Author(s) Paper Title (click for abstract)
IN-009 Samuel Berestizhevsky
& Tanya Kolosova
Creating of Consumer and Product Profile by using Polytomous Rasch Measurement Model and Relational Bayesian Networks
IN-014 Doug Thompson Comparison of three methods for transforming predictor variables to improve model fit using SAS
IN-015 Peter Flom Scatterplots: Basics, enhancements, problems and solutions.
IN-019 Xiaoting Wu Time-To-Event Analysis in the Presence of Competing risks
IN-021 Kirk Paul Lafler Exploring the Skills Needed by the Data Science / Analytics Professional
IN-036 Troy Hughes From FREQing Slow to FREQing Fast: Facilitating a Four-Times-Faster FREQ with Divide-and-Conquer Parallel Processing
IN-044 Andrea Mclain The SAS EG Process Flow: A Customizable Data Mining Tool in the Search for Healthcare Fraud
IN-061 Lynn (Xiaohong) Liu
& Roderick Jones
Leveraging RAREEVENTS Procedure Options to Monitor and Evaluate Infrequent Events in Healthcare
IN-064 Thu Dinh et al. Detecting Side Effects and Evaluating Effectiveness of Drugs from Customers' Online Reviews using Text Analytics and Data Mining Models
IN-080 Lin Qi et al. Do undergraduates need prerequisites for common courses?
IN-081 Benjamin Cronk Using SAS to predict the occurrence of study milestones used to initiate planned interim analysis
IN-083 Sai Gopi Krishna Govindarajula
& Miriam Mcgaugh
Classifying Risk in Life Insurance using Predictive Analytics
IN-087 Dan Dewitz Tuning Tufte: creating minimalist data visualizations with SG Plot
IN-088 Nancy Brucken Timing is Everything: Defining ADaM Period, Subperiod and Phase
IN-091 Zhixin Lun
& Ravindra Khattree
Simulating Skewed Multivariate Distributions Using SAS: The Cases of Lomax, Mardia's Pareto (Type I), Logistic, Burr and F Distributions
IN-092 Tao Shu The Optimal Flight Ticket Price Model Based on Bivariate Normal Distribution
IN-095 Michael Wise
& Soumya Rajesh
Findings About: De-mystifying the When and How
IN-113 Michael G. Wilson Sample Size and Design Considerations in Studies Assessing Non-Inferiority using Continuous Outcomes


Rapid Fire

Paper No. Author(s) Paper Title (click for abstract)
RF-003 Kirk Paul Lafler A Visual Step-by-step Approach to Converting an RTF File to an Excel File
RF-004 Kirk Paul Lafler Saving and Restoring Startup (Initialized) SAS® System Options
RF-017 Brooke Ellen Delgoffe 3 ways to get Pretty Excel-Style Tables: PROC REPORT, PROC TABULATE, and Help from EG
RF-025 Bill Qualls Using SAS to recreate Mike Bostock's creation of an E.J. Marey-inspired Rail Traffic Plot
RF-029 Michael Harper Using the XLSX libref engine with metadata available in Dictionary Tables
RF-033 Troy Hughes The Doctor Ordered a Prescription&Not a Description: Driving Dynamic Data Governance Through Prescriptive Data Dictionaries That Automate Quality Control and Exception Reporting
RF-034 Troy Hughes Abstracting and Automating Hierarchical Data Models: Leveraging the SAS® FORMAT Procedure CNTLIN Option To Build Dynamic Formats That Clean, Convert, and Categorize Data
RF-038 Robert G. Downer Evaluate your SCORE: Logistic regression prediction comparison using the SCORE statement
RF-042 Richard Spotswood Fuzzy Matching Commercial Entity Names
RF-058 Louise Hadden Like, Learn to Love SAS® Like
RF-060 Louise Hadden
& Troy Hughes
DOMinate your ODS Output with PROC TEMPLATE, ODS Cascading Style Sheets (CCS), and the ODS Document Object Model (DOM)
RF-067 Raj Laxmi Prakash
& Miriam Mcgaugh
Breaking Human Trafficking Network: An Analytics Approach
RF-069 Sai Teja Sagi
& Miriam Mcgaugh
The Advent of Renewable Energy
RF-077 Kathryn Schurr Utilizing Macros to Create Patient Site Matching via Zip-Code Radiuses
RF-079 Harish Reddy Patlolla
& Miriam Mcgaugh
US Airline Passenger Satisfaction using SAS Enterprise Miner
RF-100 Katelyn Ware
& Rachel Baxter
Surviving Survival Analysis 101: Making the Likelihood Ratio Test Easier Using a Macro
RF-105 Laurie Smith Comparing Dates without an Array
RF-108 Stephanie Thompson What Not to Do in a Program Used with %include
RF-109 Stephanie Thompson 10 Cool Things You Can Do in a DATA STEP


SAS 101 Plus

Paper No. Author(s) Paper Title (click for abstract)
SP-002 Kirk Paul Lafler SAS® Macro Programming Tips and Techniques
SP-005 Kirk Paul Lafler SAS® Performance Tuning Techniques
SP-012 Zeke Torres PROC FORMAT with HTML - for useful Drill Down output in Web and/or Excel
SP-018 George Vineyard Utilizing SAS Macros, Do Loops and ODS to produce automated production quality individual profiles
SP-026 Bruce Lund Logistic Regression, Basics and Beyond
SP-031 Troy Hughes User-Defined Multithreading with the SAS® DS2 Procedure: Performance Testing DS2 Against Functionally Equivalent DATA Steps
SP-043 Jayanth Iyengar
& Josh Horstman
Look up not down: Advanced Table Lookup Techinques in BASE SAS
SP-052 Josh Horstman Fifteen Functions to Supercharge Your SAS® Code
SP-053 Josh Horstman Using Macro Variable Lists to Create Dynamic Data-Driven Programs
SP-057 Louise Hadden Using ODS Trace (DOM), Procedural Output and ODS Output Objects to Create the Output of Your Dreams
SP-065 Richann Watson
& Louise Hadden
Quick, Call the "FUZZ": Using Fuzzy Logic
SP-072 LeRoy Bessler Powerful SAS® Output Delivery with ODS EXCEL
SP-093 Ben Cochran Urge to Merge? Maybe You Should Update Instead.
SP-110 Arthur Li Creating In-line Style Macro Functions


e-Poster

Paper No. Author(s) Paper Title (click for abstract)
PO-023 Derek Grittmann
& Adam Hendricks
Creating a True LSF Batch Job Submission Capability on SAS EG in a SAS Grid
PO-028 Jose Centeno Generating SAS Datasets from ASCII Files Using a Crosswalk
PO-030 Troy Hughes Badge in Batch with Honeybadger: Generating Conference Badges with Quick Response (QR) Codes Containing Virtual Contact Cards (vCards) for Automatic Smart Phone Contact List Upload
PO-040 Venkateswarlu Toluchuri Configuration and Usage of SAS®Py on Grid 9.4
PO-045 Mario Tejada Have Your SAS Program and Schedule It Too!
PO-070 Varsha Ganagalla
& Daniel Adrian
"Levels Do Count - A New Dimension To The Interaction Effect In 3-way Factorial Analysis " - Varsha Ganagalla, Dr Daniel Adrian, Grand Valley State University.
PO-071 Hai Nguyen Frequency matching case-control techniques: an epidemiological perspective
PO-082 Ted Conway Oh, There's No Place Like SAS ODS Graphics for the Holidays!
PO-097 Abigail Zysk
& Kylie Springer
Exploring Wine Reviews: How Language and Word Use Varies in Wine Reviews
PO-098 Anne Cain-Nielsen
& Scott Regenbogen
Profiling hospital length of stay using the mode
PO-099 Laurie Smith Seeing the Things We Love with SAS




Abstracts

Business Leadership

BL-006 : Differentiate Yourself
Kirk Paul Lafler, Software Intelligence Corporation

Today's job, employment, contracting, and consulting marketplace is highly competitive. As a result, SAS® professionals should do everything they can to differentiate and prepare themselves for the global marketplace by acquiring and enhancing their technical and soft skills. Topics include describing how SAS professionals should assess and enhance their existing skills using an assortment of valuable, and "free", SAS-related content; become involved, volunteer, publish, and speak at in-house, local, regional and international SAS user group meetings and conferences; and publish blog posts, videos, articles, and PDF "white" papers to share knowledge and differentiate themselves from the competition.


BL-066 : Are you Ready? Preparing and Planning to Make the Most of your Conference Experience
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

Whether you are first time conference attendee or a long-time conference attendee, this paper can help you in getting the most out of your conference experience. As long-time conference attendees and volunteers, we have found that there are some things that people just don't think about when planning their conference attendance. In this paper we will discuss helpful tips such as making the appropriate travel arrangements, what to bring, networking and meeting up with friends and colleagues, and how to prepare for your role at the conference. We will also discuss maintaining a workplace presence with your paying job while at the conference.


BL-068 : Taxi Ride Prediction: Does the Yellow Cab Supply Meet Customer Demands?
Sreejita Biswas, Oklahoma State University, Stillwater
Miriam Mcgaugh, Oklahoma State University

New York is the taxi capital of America and home to the classic yellow taxicabs. It would be beneficial to taxi companies and customers if rides are available whenever a customer is in need of one. To achieve this level of service, it is important to know how different factors affect the number of rides. This process gets complicated due to the effects of external forces such as weather. Due to pricing strategies employed by other cab companies such as surge pricing, customers are always on the hunt for affordably priced rides at the time of their need. This paper attempts to predict the demand for a yellow taxi at a particular location, on a particular day and at a particular time. This will help to estimate the number of taxis that should be present at any given time or place. This project focuses on NY Yellow Taxis dispatched from a central facility in 2018. This paper will help to understand and predict the demand and supply of yellow taxicabs and help to improve the process for both, customer satisfaction and the taxi company industry. Six months' worth data (Jan'18 - June'18) from the New York City Taxi and Limousine Commission and from Weather Underground were obtained including information such as pick-up/ drop-off locations, time/ date, distance, payment source, temperature, wind speed, and precipitation levels. Multiple models will be built to predict the demand and supply using SAS® Forecast Studio due to the variations in the weather over six months.


BL-074 : Impact of Aging Population on Social Security and Underlying Trends
Prajakta Pai, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

For years Social Security has been a major source of income for retired individuals and their families. Social Security was developed as an anti-poverty program that focuses on retired workers, disabled individuals and survivors of workers. Over the years, factors such as improved longevity, education and retirement of baby boomers has resulted in strain on the Social Security reserves. Research suggests the Social Security benefits will soon be reducing due to more beneficiaries within the system. Currently, over 61 million beneficiaries are paid each month. However, with the retirement of the baby boomer population, the Social Security funds will see an increase in payouts and a decrease of income. It is imperative to look at the trends in population versus individuals who would be availing Social Security upon retirement to establish a trend and identify factors that may contribute towards a deficit or surplus in Social Security funds for the coming years. This project examined the trends in Social Security beneficiaries and the aging population, while also looking at trends among the non-resident population in each state. Based on the aforementioned data, we will investigate if change in the population of non-resident workers has an effect on the Social Security funds in a specific region of the country. SAS Enterprise Miner will be used to explore the US Census and Social Security Administration data. The expectation is that the population of non-resident workers will contribute towards increasing Social Security funds and reduce the disparity from the aging population.


BL-076 : Why Admitted Students Opt-out of College Enrollment?
Manasi Murde, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

Understanding and improving student enrollment has always been important for departments across universities. It is imperative for universities to have a firm grip on addressing enrollment of prospects, applicants and admitted students. As per research, one out of every three students who are admitted do not enroll in that college. One of the prime reasons why universities must concentrate on increasing their enrollment rate is to avoid loss of revenue. Admitted but non-enrolled students have a negative financial impact on a university due to the loss of tuition fees while expending money on marketing and recruitment. Non-enrollment of students also results in loss of academic potential. There can be numerous reasons why admitted students fail to enroll within a particular university, which may be specific to one university or common among many schools. Therefore, each school must identify any existing trends within its own unique admitted non-enrolled student population base. This research concentrates on analyzing data to identify various factors and indicators for why students fail to enroll after accepted admission to Oklahoma State University. The research will also focus on predicting non-enrollment based on those factors. The data for this analysis includes email communications, text messages, demographic, and admission process timestamps. Preliminary analysis indicates that non-enrollment is higher among first generation students and academically talented students. Results from this research will help the Marketing Department of Oklahoma State University improve undergraduate enrollment by understanding non-enrollment rate across demographics and communications/factors that lead to non-enrollment.


BL-101 : Exploring and characterizing time series data in a non-regression based approach
Steven Myers, The University of Akron

Business leaders as well as data analysts and data scientists need to have an understanding of the particularities of time series data. This paper reports on an introduction to time series as taught to students in a first business analytics course making use of data from FRED, the marvelous time series repository at the St. Louis Federal Bank. Students are cautioned not to run to advance techniques before stopping to fully explore the data and this approach is designed to instill a EDA mentality into the students while teaching them how to manipulate and characterize time series data in SAS and thereby set the ground work for more advanced work in time-series econometrics, forecasting and predictive analytics. Also, instilled in the students is an appreciation of knowing the data generating process. SAS programming is taught through this approach focusing on SAS functions such as DIF and LAG, Procs CORR, MEANS, and SGPLOT. The paper concludes with a basic coverage of random walk and spurious correlation that can easily result in economic time series data when one does not first investigate data stationarity.


BL-104 : Explore your data before you rush to analysis, you will thank me later: explorations in cross section data
Steven Myers, The University of Akron

Economists, business leaders and analysts spend a great deal of time analyzing structured cross sectional data. This paper is an introduction to exploratory data analysis for economic and business data analytic students in an introductory course in economics to teach data handling and SAS programming and features procs MEANS, SGPLOT, FREQ, CORR, REG, and TABULATE. A dataset on rents paid is used to illustrate the solution of the problem: do women pay higher rents on a college campus? Important to learn all you can about your data before rushing to analysis, yet students typically rush to more advanced and fancier techniques. In this paper we show how to ground the analysis in a firm understanding the data generating process and suggest many ways to learn about the under lying data. Additional data is introduced to illustrate the problems of data cleaning and manipulations in large samples that illustrate the effect of economic freedom on standards of living worldwide and concludes with the steps in the process to reveal causal patterns in that data. The experience of students is highlighted.


Hands On Workshop

HW-007 : Powerful and "Sometimes" Hard-to-find PROC SQL Features
Kirk Paul Lafler, Software Intelligence Corporation

The SQL Procedure contains many powerful and elegant language features for intermediate and advanced SQL users. This hands-on workshop presents topics that will help SAS users unlock the many powerful features, options, and other gems found in the SQL universe. Topics include using CASE logic to assign new values; a sampling of summary (statistical) functions; identifying FIRST, LAST, and BETWEEN rows in By-groups; access metadata from Dictionary tables; create single-value and value-list macro variables using the PROC SQL and macro interface; perform two table joins and discuss the four available join algorithms; and use PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103 to better understand what the SQL optimizer does with a query.


HW-020 : Survival Tips for Survival Analysis
Xiaoting Wu, University of Michigan

Survival analysis is a common type of analysis in health care field. This hand-on tour will convey some survival tips during survival analysis. We will provide an overview on the survival analysis using SAS LIFETEST and PHREG procedures, including data preparation and visualization, variable selection, model specification, model validation and output interpretation. We will also showcase some advanced application on how to obtain prediction estimates, customize and output plots from SAS PHREG procedure.


HW-022 : Understanding Administrative Healthcare Data sets using SAS programming tools
Jayanth Iyengar, Data Systems Consultants LLC

Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers, and insurance companies is massive, and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of Administrative Healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This training seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises using healthcare datasets.


HW-037 : Parallel Processing Your Way to Faster Software and a Big Fat Bonus: Demonstrations in Base SAS®
Troy Hughes, Datmesis Analytics

SAS® software and especially extract-transform-load (ETL) systems commonly include components that must be serialized due to real process dependencies. For example, a transform module often cannot begin until the data extraction completes, and a corresponding load module cannot begin until the data transformation completes; thus, the E, T, and L must occur in sequence. Although process dependencies such as these cannot be avoided in many cases and necessitate serialized software design, in other cases, programs or data can be distributed across two or more SAS sessions to be processed in parallel, facilitating significantly faster software. This text introduces the concept of false dependencies, in which software is serialized by (poor) design rather than necessity, thus needlessly increasing execution time and deprecating performance. Three types of false dependencies are demonstrated as well as distributed software solutions that eliminate false dependencies through parallel processing, arming SAS practitioners to accelerate both their software and salaries.


HW-050 : Doing More with the SGPLOT Procedure
Josh Horstman, Nested Loop Consulting

Once you've mastered the fundamentals of using the SGPLOT procedure to generate high-quality graphics, you'll certainly want to delve in to the extensive array of customizations available. This workshop will move beyond the basic techniques covered in the introductory workshop. We'll go through more complex examples such as combining multiple plots, modifying various plot attributes, customizing legends, and adding axis tables.


HW-063 : Interactive Graphs
Richann Watson, DataRich Consulting
Kriss Harris, SAS Specialists Ltd

This paper demonstrates how you can use interactive graphics in SAS® 9.4 to assess and report your safety data. The interactive visualizations that you will be shown include the adverse event and laboratory results. In addition, you will be shown how to display "details-on-demand" when you hover over a point. Adding interactivity to your graphs will bring your data to life and help improve lives!


HW-086 : Base SAS® & SAS® Enterprise Guide® Automate Your SAS® World with Dynamic Code ~ Forwards & Backwards
Kent Phelps, Illuminator Coaching, Inc.
Ronda Phelps, Illuminator Coaching, Inc.

Communication is the basic foundation of all relationships, including our SAS relationship with the server, PC, or mainframe. To communicate more efficiently ~ and to increasingly automate your SAS world ~ you will want to learn how to transform static code into dynamic code that automatically re-creates the static code, and then executes the re-created static code automatically. Our presentation highlights the powerful partnership that occurs when dynamic code is creatively combined with a dynamic FILENAME statement, macro variables, the SET INDSNAME option, and the CALL EXECUTE command within one SAS Enterprise Guide Program node. You have the exciting opportunity to learn how to design dynamic code forwards and backwards to re-create static code while automatically changing the year as 1,574 time-consuming manual steps are amazingly replaced with only one time-saving dynamic automated step. We invite you to attend our Dynamic Code Presentation, in which we detail the UNIX and Microsoft Windows syntax for our project example and introduce you to your newest BFF (Best Friend Forever) in SAS. Please see the appendixes to review additional starting-point information about the syntax for IBM z/OS, and to review the source code that created the data sets for our project example.


HW-094 : Not Even One Single Solitary Semicolon: Powerful SAS Things You Can Do Without Writing Programs
Ben Cochran, The Bedford Group

This presentation starts by illustrating converting different kinds of data into SAS datasets. Specifically, Excel spreadsheets and Microsoft Access tables are converted into SAS data. Then, these two different data sources are joined with an existing SAS data set. Finally, a series of graphical and tabular reports are generated from this combined data. All of these tasks are completed without writing any SAS programs.


Industry

IN-009 : Creating of Consumer and Product Profile by using Polytomous Rasch Measurement Model and Relational Bayesian Networks
Samuel Berestizhevsky, InProfix Inc
Tanya Kolosova, Co-author

Consumer and Product Profiles are blueprint for identifying consumers attitude in relation to product attributes. Consumer and Product Profiles outline critical attributes of products to be offered to consumers and to be aligned with consumers preferences to assure the product satisfies consumers' needs now and will be demanded in the future. Consumer and Product Profile development process comprises from gathering data via simple surveys about product attributes and consumer preferences. The analysis of this data results in a Consumer and Product Profile that describes the differentiating consumer preferences regarding specific product attributes. Unfortunately, this survey data is often inappropriately analyzed, and it leads to a wrong assessment of Consumer and Product Profiles, incorrect inferences about consumer preferences and product attributes strengths and misleading recommendations on how to improve the product. We developed innovative mathematical approaches, algorithms and software solutions that not only help to overcome the problems with analysis of consumer and product surveys but also help to build Consumer and Product Profiles in a fully automated and scalable way. Our solutions provide accurate and reliable information about the preferences of an individual consumer, her/his perception of product attributes, and eventually create an accurate quantitative estimation of Consumer and Product Profiles. Product designers, manufacturers, and retailers can use Consumer and Product Profiles to create products that meet customers' needs and expectations, create hyper-targeting marketing campaigns, personalize and optimize product prices etc. SAS/Base and SAS/STAT are modules used in our development.


IN-014 : Comparison of three methods for transforming predictor variables to improve model fit using SAS
Doug Thompson, Rush Health

Continuous and ordinal predictor variables are common in predictive modeling (e.g., age in years, medical expenditures last year). Often, such variables are non-linearly related to the predictive modeling target. To maximize the accuracy of a predictive model, non-linear associations need to be taken into account and included in the final model when appropriate. There seems to be no consensus on how best to detect and quantify non-linear associations when building predictive models. Several methods have been proposed in the literature, including cubic splines and exploring a wide variety of functional forms then selecting the best-fitting via stepwise techniques. Although multivariate adaptive regression splines (MARS) and similar methods are often viewed as a stand-alone technique for predictive modeling, these techniques could also be used for exploring non-linear associations that are then included in a final model constructed using some other modeling technique (e.g., logistic regression or neural networks). The purpose of this paper is to illustrate three possible methods for exploring non-linear associations in predictive modeling using SAS: Cubic splines, MARS, and stepwise selection of the best fitting of exploratory functional forms. A SAS macro is described, facilitating easy implementation and evaluation of each of these techniques. The techniques illustrated require only SAS/STAT (particularly PROCs ADAPTIVEREG, LOGISTIC and SGPLOT). The audience is assumed to have intermediate familiarity with predictive modeling. The techniques are illustrated in a context that is common and important within the healthcare industry: Predicting which patients will have relatively high healthcare expenditures next year.


IN-015 : Scatterplots: Basics, enhancements, problems and solutions.
Peter Flom, Peter Flom Consulting

The scatter plot is a basic tool for presenting information on two continuous variables. While the basic plot is good in many situations, enhancements can increase its utility. I also go over tools to deal with the problem of overplotting. SAS, any operating system or version, appropriate for all levels.


IN-019 : Time-To-Event Analysis in the Presence of Competing risks
Xiaoting Wu, University of Michigan

Competing risks are common phenomena in time-to-event analysis. A competing risk may take place before the event of interest thus exclude the possibility of event occurrence. For example, in the study of artificial heart valve duration, death is a competing risk as it modifies a patient's chance to receive potential reoperation due to valve deterioration. Ignoring competing risks, for example, the use of standard Kaplan-Meier estimators, will result in biased estimates for the event of interest. Cumulative incidence function that estimates the probability of event of interest over time, and cause-specific hazard function that models the effect of covariates on the event of interest, are two main approaches to perform time-to-event analysis in the presence of competing risk. This paper demonstrates the rational, implementation and interpretation of these methods, with SAS applications using SAS macro % CIF, LIFETEST and PHREG procedure.


IN-021 : Exploring the Skills Needed by the Data Science / Analytics Professional
Kirk Paul Lafler, Software Intelligence Corporation

As 2.5 quintillion bytes (1 with 18 zeros) of new data are created each and every day, the age of big data has taken on new meaning with a renewed sense of urgency to prepare students, young professionals, and other workers across job functions for todays and tomorrows analytics-roles along with the necessary analytical skills to tackle growing data demands. With the number of organizations embracing Data Science / Analytics skills and tools, organizations like LinkedIn, a leading professional networking and employment-oriented website and app, found that Data Scientists saw a 56% increase in the US job market in 2018. To keep up with the huge demand for analytics talent in 2019 and beyond, many colleges, Universities, and training organizations offer comprehensive Data Science / Analytics degrees and certificate programs to fulfill the increasing demand for analytical skills. This presentation explores the skills needed by the Data Science / Analytics professional including critical thinking; statistical programming languages such as SAS®, R or Python; Structured Query Language (SQL); Microsoft Excel; and data visualization.


IN-036 : From FREQing Slow to FREQing Fast: Facilitating a Four-Times-Faster FREQ with Divide-and-Conquer Parallel Processing
Troy Hughes, Datmesis Analytics

With great fanfare, the release of SAS® 9 delivered multithreaded processing to a single-threaded SAS world. Procedures such as SORT, SQL, and MEANS could now run faster by taking advantage more fully of system resources through parallel processing paradigms. Multithreading commonly implements divide-and-conquer methodologies in which data sets or data streams are decomposed into subsets and processed in parallel rather than in series. Multithreaded solutions are faster (but typically not more efficient) than their single-threaded counterparts because execution time (but not system resource utilization) is decreased. As the costs of memory and processing power have continued to decrease, however, there remains no excuse for not implementing multithreaded processing wherever possible. To this end, and because SAS unfortunately abandoned some hapless procedures in single-threaded Sheol, this text aims to reunite the single-threaded FREQ procedure with its multithreaded bedfellows. The FREQFAST macro is introduced and espouses divide-and-conquer parallel processing that performs a frequency analysis more than four times faster than the out-of-the-box FREQ procedure. Non-environmental factors affecting FREQ performance (e.g., number of observations, number of unique observations, file size) are elucidated and modeled to demonstrate and predict performance improvement delivered through FREQFAST.


IN-044 : The SAS EG Process Flow: A Customizable Data Mining Tool in the Search for Healthcare Fraud
Andrea Mclain, Cigna Health Insurance

It is estimated that tens of billions of dollars are lost each year in fraudulent healthcare insurance claims. The implications of this go way beyond financial losses and higher insurance premiums. For instance, many fraud schemes could result in patient exploitation or harm, or the illicit gains could be used in the furtherance of other criminal activities. Health insurance companies utilize data mining and predictive analytics to identify potentially fraudulent claims. Many third party companies create products just for this very purpose, where algorithms are used to flag claims exhibiting some known fraudulent pattern. Products built by these companies are exceptionally helpful in identifying and ultimately stopping and preventing insurance fraud, but many situations call for more, and a means beyond pre-built algorithms are necessary. This presentation is about one such instance, where a creative, on-the-fly, data mining process was built within a SAS project to identify potential health insurance fraud in natural disaster scenarios, such as a hurricane or large-scale wildfire. This presentation will detail how an analyst started with millions of insurance claims and then utilized very simplistic analytical methods within a SAS project to generate a small list of potentially fraudulent healthcare providers who billed for services they likely could not have rendered due to circumstances surrounding a natural disaster. The SAS skills required to create this process were basic, but speak to the larger concepts of intelligence analysis and data mining in the identification of a criminal pattern.


IN-061 : Leveraging RAREEVENTS Procedure Options to Monitor and Evaluate Infrequent Events in Healthcare
Lynn (Xiaohong) Liu, Ann & Robert H. Lurie Childrens Hospital of Chicago
Roderick Jones, Ann & Robert H. Lurie Children's Hospital of Chicago

In healthcare, the purpose of statistical process control (SPC) is often to quantify improvements and identify unintended consequences resulting from an intentional change in an environment, policy, treatment protocol, or decision-support tool. The RAREEVENTS procedure got the acceptance in health care quality improvement applications due to its suitability for infrequent, low-probability events. The enhancements of the procedure in SAS/QC version 15.1 allows users to apply tests to detect special cause variations. We provide examples from healthcare, representing the geometric and exponential distributions, to describe approaches leveraging the RAREEVENTS procedure options, including READPHASE=, READINDEXES=, PHASEREF, PHASELEGEND, TESTS=, TESTACROSS and TESTOVERLAP.


IN-064 : Detecting Side Effects and Evaluating Effectiveness of Drugs from Customers' Online Reviews using Text Analytics and Data Mining Models
Thu Dinh, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

Drug reviews play a very important role in providing crucial medical care information for both healthcare professionals and consumers. Customers are increasingly utilizing online review sites, discussion boards and forums to voice their opinions and express their sentiments about experienced drugs. However, a potential buyer would find it almost impossible to review all of these online comments before making a purchase decision. Another big challenge would be the unstructured, qualitative, and textual nature of the reviews, which makes it difficult for readers to classify the comments into meaningful insights. The aim of the present paper is to identify a data-mining model to evaluate the effectiveness and detect potential side effects from online customer reviews on specific prescriptive drugs. This study utilizes text parsing, text filtering, text clustering within SAS® Enterprise Miner" 14.3 for feature engineering and SAS® Sentiment Analysis Studio 12.2.5 for sentiment analysis. Further, multiple machine learning models including logistic regression, decision tree and neural network are employed to identify an optimal model. The study's preliminary results show that the best predictive model for side effect detection is Neural Network with a validation misclassification rate of 23.4% and a sensitivity rate of 68.5%. Regarding effectiveness classification, neural network model also works best with 18.2% validation misclassification rate and 91.6% sensitivity rate. These models will be further improved and the information will be employed to evaluate model performance and validity. The results can help as practical guidelines and useful references for prospective patients in making better informed purchase decisions.


IN-080 : Do undergraduates need prerequisites for common courses?
Lin Qi, Oklahoma State University
Archana Chinnaswamy, InterWorks.Inc
Miriam Mcgaugh, Oklahoma State University

A good design of common courses is the starting point for a business school student's academic success. The Oklahoma State University (OSU) Spears School of Business underwent a revision of their common core curriculum reducing the number to 10 core courses for all business school majors and none requires a prerequisite course for enrollment. Students may choose which course they want to take first. From a student's point of view, the lack of prerequisite allows the student more freedom in their schedule. However, enrolling in high-level courses before low-level courses may end up with more students getting a D, failing the course or withdrawing from the course (DFW). To test the value of requirement more structured schedule or potential prerequisite requirement, this paper is going to analyze the order effect of common courses on students' DFW rate for business common courses. About 5,000 students' demographic information and 50,000 common course enrollment outcomes datasets were supplied by OSU Institutional Research & Information Management department. This research used SAS Studio for data preparation and SAS Enterprise Miner for data modeling including decision tree, logistic regression, and neural network. Final model selection will be made according to the Average Square Error of models. The results show the decision tree as the best model with the variables significantly influencing DFW rates as high school core GPA and degree major. The influence of order is not significant. Undergraduates may not need prerequisite requirement to succeed in the business school common course.


IN-081 : Using SAS to predict the occurrence of study milestones used to initiate planned interim analysis
Benjamin Cronk, Amgen

In clinical trials, it is common for an interim analysis to occur upon reaching a study milestone such as a predetermined number of clinical events. Estimating the date at which the trial is likely to accrue those clinical events is beneficial for a number of reasons including planning for programming resources. This paper will use SAS/ETS procedures and clinical events observed at the beginning of a trial to estimate the time at which the trial is likely to have generated the number of clinical events needed to initiate planned interim analysis. The SGPLOT procedure will be used to generate figures that effectively show the number of events currently observed on the clinical trial as well as the projection of future adverse events including confidence intervals.


IN-083 : Classifying Risk in Life Insurance using Predictive Analytics
Sai Gopi Krishna Govindarajula, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

Ever wonder how many companies offer life insurance? There are more than 600 companies in the US alone offering life insurance policies. Insurance companies perform an underwriting process to assess the risk on life insurance applicants and then price the policies if approved. Those underwriters gather extensive information about applicants, which include extensive health histories, to classify risk profiles. The process of collecting existing data for the risk assessment, completing and obtaining any required patient health exams, and validating all the information often takes several weeks to months. In this fast-paced world, customers are prone to lose interest in finalizing policies from companies who take a prolonged time to evaluate an application. With the advent of data analytics, the underwriting process can be streamlined and completed much faster. The intention for this project was to build predictive models based on past customer history and to recommend the most appropriate model to assess risk resulting in better underwriting practices and customer retention. A real time data set having around 140 variables, which included a combination of categorical and continuous variables, was analyzed using SAS® Enterprise Miner" and Tableau® for predictive modeling and data visualization, respectively. Machine learning algorithms like Logistic regression, Neural network were implemented to assess risk and findings revealed that regression model has showed highest performance with a misclassification rate of 21.09%.


IN-087 : Tuning Tufte: creating minimalist data visualizations with SG Plot
Dan Dewitz, WPS Health Solutions

Tufte wrote about the data-ink ratio--the data-ink divided by the total ink used to produce the graphic. The default settings of SG Plot defy this logic by drawing a border around the legend, the graph space, and the entire graphic! The aim of this paper is to provide pragmatic solutions for designing minimalist and effective graphics using SG Plot. This paper provides tips on designing the custom plot you want without having to learn the Graph Template Language (GTL). Topics covered include implementing custom color pallets, designing dynamic graphics that are robust and able to handle changing data, selectively highlighting key metrics, adding custom labels to time series plots, plotting transformed data, using broken scales, and how to adhere to widely accepted data viz design principles in SG Plot. This paper also implements definitive examples from the most popular graphing handbooks-The Visual Display of Quantitative Information, Story Telling with Data, The Functional Art--using SG Plot.


IN-088 : Timing is Everything: Defining ADaM Period, Subperiod and Phase
Nancy Brucken, Syneos Health

The CDISC Analysis Data Model Implementation Guide (ADaMIG) provides several timing variables for modeling clinical trial designs in analysis datasets. APHASE, APERIOD and ASPER can be used in conjunction with related treatment variables to meet a variety of analysis requirements, from single-period parallel studies to much more complicated situations involving multiple treatment periods and even different studies. The goal of this paper is to illustrate how some of these study designs may be handled in ADaM, and provide guidelines for selecting when to use the different timing variables that are available.


IN-091 : Simulating Skewed Multivariate Distributions Using SAS: The Cases of Lomax, Mardia's Pareto (Type I), Logistic, Burr and F Distributions
Zhixin Lun, Oakland University
Ravindra Khattree, Oakland University

By using various build-in functions in SAS software, it is convenient to generate data from several common multivariate distributions such as multivariate normal (RANDNORMAL function) and multivariate Student's t (RANDMVT function). However, functions for directly generating data from other less common multivariate distributions are not readily available in SAS. We will illustrate how to simulate and generate random numbers from a multivariate Lomax distribution. Importance of the work lies in its wide applicability in reliability theory and many other situations where one needs to use a flexible nonnegative skewed multivariate distribution for modeling. Further, based on various useful properties of multivariate Lomax distribution, Mardia's multivariate Pareto of type I, multivariate Logistic, multivariate Burr, and multivariate F random variables can be readily simulated. We develop and implement a SAS macro using SAS/IML to generate random numbers from these multivariate probability distributions.


IN-092 : The Optimal Flight Ticket Price Model Based on Bivariate Normal Distribution
Tao Shu, Eli Lilly

There are many factor driving prediction models linear or nonlinear. In many cases they often fail to generate result as expected. It is because of the limits of the analytic model by which it is not able to deal with the unknown or uncontrollable factors that impact the object we are interested. Instead to dig out all the factors we would rather look at the variables in random fashion. From statistical prospective the variables do have the mysterious connections among them. To find such characteristics is the key for decision making. For example we can construct the bivariate normal distribution for two random variables. With that we can decide one variable first then find the best for the second one so as to reach our goal. This paper applies the idea to the airline ticket price strategy. With the simulation the optimized model can produce better sales result than the random one.


IN-095 : Findings About: De-mystifying the When and How
Michael Wise, Syneos Health
Soumya Rajesh, Syneos Health

CDISC offers Findings About (FA) and Supplemental Qualifiers (SuppQual) to handle information that doesn't fit into standard domains - or 'Non-standard variables'. They are however, quite distinct from each other and the appropriate use for each may still lead to confusion. "When should FA be created?" or "When is it best to use SUPPQUAL?" These are important questions that can only be answered by asking additional data questions. When the data does not fit into the parent domain, it may only be mapped to SUPPQUAL if it relates to one parent record. However, almost all other situations are covered by FA - wherein data relates to multiple records, or when a two-way relationship is needed etc. FA would be the right approach then, because it has versatility beyond what's offered by SUPPQUAL. For example, FA would provide a way of storing symptoms along with the time that they began and relating each back to the AETERM in the AE dataset. In addition, FA as a stand-alone domain is also the only place to store information surrounding an event or intervention that has not been captured within any specific domain. This paper will present examples from a few different therapeutic areas or domain relationships to highlight the proper use of FA. Another scenario will look into hoe FA accommodates a many to many relationship. Such examples should clarify mysteries surrounding when and how to best use or create FA.


IN-113 : Sample Size and Design Considerations in Studies Assessing Non-Inferiority using Continuous Outcomes
Michael G. Wilson, IUSM

Software developers employ incremental progress to cause radical development break throughs. The same is true in medicine, manufacturing and finance. For example, a new anti-diabetic medicine might not have superior outcomes of improved glycemic control, but it might be less expensive. Or a new device for use in hand surgery might not have superior digital mobility, but might be easier for the surgeon to implant. Or perhaps micro-loans to novice entrepreneurs might not raise the economic output for the county, but might cultivate cooperation among local businesses. These are examples where the outcome of a new method might not be objectively worse, that is non-inferior, but would have some reason to replace the current method and instigate incremental progress. SAS users are often asked to size and design studies to test this kind of non-inferiority. Such a design requires consideration of the frame-work of the hypothesis set-up, the directionality, the determination of the non-inferiority margin and the proper analysis method. In this review, the rationale for these considerations will be presented, common misunderstandings clarified and examples using SAS/STAT® given.


Rapid Fire

RF-003 : A Visual Step-by-step Approach to Converting an RTF File to an Excel File
Kirk Paul Lafler, Software Intelligence Corporation

Rich Text Format (RTF) files incorporate basic typographical styling and word processing features in a standardized document that many programs and applications are able to read. In today's high-tech arena sometimes the contents of an RTF file needs to be viewed as, and even converted to, an Excel file. You would think that since both RTF and Excel are Microsoft standards that this would be a simple process to achieve, but you may be surprised to find out that is not the case. Learn about several "free" web-based and online applications as well as traditional SAS®-based programming techniques that can be used to convert an RTF file to an Excel file.


RF-004 : Saving and Restoring Startup (Initialized) SAS® System Options
Kirk Paul Lafler, Software Intelligence Corporation

Processing requirements sometimes require the saving (and restoration) of SAS® System options at strategic points during a program's execution cycle. This paper and presentation illustrates the process of using the OPTIONS, OPTSAVE, and OPTLOAD procedures to perform the following operations: - Display portable and host-specific SAS System options and their settings; - Display restricted SAS System options; - Display SAS System options that can be restricted; - Display information about SAS System option groups; - Display a list of SAS System options that belong to a specific group; - Display a list of SAS System options that can be saved; - Save startup SAS System options; - Restore startup SAS System options, when needed.


RF-017 : 3 ways to get Pretty Excel-Style Tables: PROC REPORT, PROC TABULATE, and Help from EG
Brooke Ellen Delgoffe, Marshfield Clinic Research Institute

In many cases SAS programmers may be asked to provide tabular or summary data in an Excel-Style format (stacked headers, colored headers, bolded total lines, etc.). This paper explores 3 different ways to produce and export excel-style tables into ODS locations or the results window using SAS 9.4 or Enterprise Guide 7.15. Addition of ODS style elements will help readers to apply aesthetically pleasing colors and formatting to their output. The use of Enterprise Guide will help new users to SAS perform these tasks with little to no SAS programming knowledge, while helping more versed SAS programmers utilize pre-written code as a starting point. Exploration of how to use PROC SQL in combination with PROC REPORT to display distinct counts and other uniquely formatted summary statistics will give programmers a succinct way to display summary statistics in the midst of required value duplication. For example, multiple Body Mass Index (BMI) entries for a single patient identifier may be needed to provide average mean BMI per patient per period, but distinct patient counts may still be desired. A series of examples like this one will be used to cover each of the methods. Brief Outline: Introduction PROC SQL with PROC REPORT " Using PROC SQL to obtain and format counts of interest " Using PROC REPORT to display grouped data with stacked headers " Customizing PROC REPORT output with style elements " EXAMPLE: Distinct patient counts (fake patient data provided in data step) PROC TABULATE For Presenting Multiple statistics on the same variable " Creating a grouped table of summary statistics with stacked headers " Exporting to the ODS Excel location " EXAMPLE: Cars data with stacked headers using SASHELP.cars Summary Tables Builder in Enterprise Guide " Creating the same output as the PROC TABULATE example above " Step by step process with screenshots " Formatting values with Enterprise Guide " Filtering data in the Summary Tables Builder SAS Versions -SAS EG 7.15 (HF7)* -SAS 9.4 (TS LEVEL 1M3) *dependency


RF-025 : Using SAS to recreate Mike Bostock's creation of an E.J. Marey-inspired Rail Traffic Plot
Bill Qualls, First Analytics

Mike Bostock, a principal developer of the d3.js data visualization library, has published a program showing how he used d3.js to create an E.J. Marey-inspired graph showing San Francisco area commuter rail traffic. This paper will show how to create a string plot using SAS by recreating Bostock's graph. Along the way the reader will learn several useful hacks to improve their own use of SAS' PROC SGPLOT.


RF-029 : Using the XLSX libref engine with metadata available in Dictionary Tables
Michael Harper, A-Line Staffing

This paper will show simple reference to external Excel workbooks, and how to reference their metadata elements young the Dictionary tables.


RF-033 : The Doctor Ordered a Prescription&Not a Description: Driving Dynamic Data Governance Through Prescriptive Data Dictionaries That Automate Quality Control and Exception Reporting
Troy Hughes, Datmesis Analytics

Data quality is a critical component of data governance and describes the accuracy, validity, completeness, and consistency of data. Data accuracy can be difficult to assess, as it requires a comparison of data to the real-world constructs being abstracted. But other characteristics of data quality can be readily assessed when provided a clear expectation of data elements, records, fields, tables, and their respective relationships. Data dictionaries represent a common method to enumerate these expectations and help answer the question What should my data look like? Too often, however, data dictionaries are conceptualized as static artifacts that only describe data. This text introduces dynamic data dictionaries that instead prescribe business rules against which SAS® data sets are automatically assessed, and from which dynamic, data-driven, color-coded exception reports are automatically generated. Dynamic data dictionaries-operationalized within Excel workbooks-allow data stewards to set and modify data standards without having to alter the underlying software that interprets and applies business rules. Moreover, this modularity-the extraction of the data model and business rules from the underlying code-flexibly facilitates reuse of this SAS macro-based solution to support endless data quality objectives.


RF-034 : Abstracting and Automating Hierarchical Data Models: Leveraging the SAS® FORMAT Procedure CNTLIN Option To Build Dynamic Formats That Clean, Convert, and Categorize Data
Troy Hughes, Datmesis Analytics

The SAS® FORMAT procedure "creates user-specified formats and informats for variables." In other words, FORMAT defines data models that transform (and sometimes bin) prescribed values (or value ranges, in the case of numeric data) into new values. SAS formats facilitate multiple objectives of data governance, including data cleaning, the identification of outliers or new values, entity resolution, and data visualization, and can even be used to query or join lookup tables. SAS formats are often hardcoded into SAS software, but where data models are fluid, formats are best defined within control files outside of software. This modularity-the separation of data models from the programs that utilize them-allows SAS developers to build and maintain SAS software independently while domain subject matter experts (SMEs) separately build and maintain the underlying data models. Independent data models also facilitate master data management (MDM) and software interoperability, allowing a data model to be maintained as a single instance, albeit implemented not only with SAS but also Python, R, or other languages or applications. The CNTLIN option (within the SAS FORMAT procedure) facilitates this modularity by creating SAS formats from data sets. This text introduces the BUILD_FORMAT macro that greatly expands the utility of CNTLIN, allowing it to build formats not only from one-to-one and many-to-one format mappings but also from multitiered, hierarchical data models that are built and maintained externally in XML files. The numerous advantages of BUILD_FORMAT are demonstrated through successive SAS code examples that rely on the taxonomy of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5).


RF-038 : Evaluate your SCORE: Logistic regression prediction comparison using the SCORE statement
Robert G. Downer, Grand Valley State University

Evaluate your SCORE: Logistic regression prediction comparison using the SCORE statement The SCORE statement in PROC LOGISTIC was introduced in SAS/STAT 9.0 and it is a feature that can be utilized efficiently to quickly evaluate prediction performance for new observations. Used in conjunction with the OUTMODEL and INMODEL statements, the SCORE statement can be a very beneficial aid in quickly comparing the prediction performance of multiple logistic regression models for the same test or validation observations. The concise syntax of these statements will be illustrated. Performance criterion output such as the misclassification rate will be discussed through a worked example involving multiple models of a binary response. Although some knowledge of logistic regression would be beneficial for full understanding of this paper, it is written for a general audience interested in predictive modeling.


RF-042 : Fuzzy Matching Commercial Entity Names
Richard Spotswood, Modeler

Analysts need to join tables on commercial entity names when no common keys exist between two tables. Commercial entity names differ from personal names: as a legal entity, the commercial entity name is generally standardized, but the commercial name is often subject to abbreviations, truncations, the addition of non-alphabetic characters and re-characterization via branding. Previous papers on SAS® and fuzzy matching generally follow the Fellegi-Sunter record linkage methodology: data transforms are used to clean up data, followed by a Cartesian join and an edit-distance metric to obtain an acceptable match rate. This paper follows a similar methodology, but focuses exclusively on the frequent case where there are only abbreviations and truncations to the canonical name. Following Jaro and Winkler, who derived edit distances by explicitly modeling the location of transpositions in personal name transcriptions, this paper suggests that truncations and abbreviations on electronic records is best modeled as deletions from a canonical name and best captured through regular expressions and Longest Common Subsequence (LCSQ). To illustrate the approach, the LCSQ metric is created by using COMPCOST and COMPGED within a PROC FCMP wrapper, while regular expression matching is done with two tables using a mixture of PRX_CHANGE and PRX_MATCH.


RF-058 : Like, Learn to Love SAS® Like
Louise Hadden, Abt Associates Inc.

How do I LIKE SAS®? Let me count the ways.... There are numerous instances where LIKE or LIKE operators can be used in SAS - and all of them are useful. This paper will walk through such uses of LIKE as: using the LIKE condition to perform pattern-matching; searches and joins with that smooth LIKE operator (and the NOT LIKE operator); the SOUNDS LIKE operator; and PROC SQL CREATE TABLE LIKE. We will explore the pros and cons of each LIKE functionality in SAS, and suggest alternatives if LIKE falls short of love.


RF-060 : DOMinate your ODS Output with PROC TEMPLATE, ODS Cascading Style Sheets (CCS), and the ODS Document Object Model (DOM)
Louise Hadden, Abt Associates Inc.
Troy Hughes, Datmesis Analytics

SAS® practitioners are frequently forced to produce SAS output in mandatory formats, such as using a company logo, corporate or regulated government templates and/or cascading style sheet (CSS). SAS provides several tools to enable the production of customized output. Among these tools are the ODS Document Object Model, cascading style sheets, PROC TEMPLATE, and ODS style overrides (usually applied in procedures and/or in originating data.) This paper and presentation investigates "under the hood" of the Output Delivery System destinations and the PROC REPORT procedure and investigates how mastering ODS TRACE DOM and controlling styles with the CSSSTYLE= option, PROC TEMPLATE, and style overrides can satisfy client requirements and enhance ODS output.


RF-067 : Breaking Human Trafficking Network: An Analytics Approach
Raj Laxmi Prakash, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

Labor migration, illegal sex workers, child trafficking and many more impact our society in many ways. The need for developing a tool to discover and track human trafficking is extremely important and pulls the attention of many researchers [1]. This exploratory paper describes the comprehensive analysis of the role of online classified advertisements in facilitating sex trafficking specifically and explores technological innovations to combat the increasing network of human traffickers. With the growth of internet and social media, human trafficking networks are spreading from the ease of communication. On websites such as Backpage.com, different online advertisements are posted to lure men, women, teens and children [6]. These ads are used for selling as well as recruiting potential victims by manipulations and false promises of a job etc. Traffickers have become more sophisticated in their methods resulting in being seemingly untraceable and hiding their identities. In 2018, Backpage.com was seized by the FBI for their participation in illegal prostitution and sex trafficking. However, this did not end the problem but shifted it to unknown places [6]. To combat this growing problem, this paper uses the power of text and network analytics to build models for identifying different categories of advertisements and potential connected relationships including timing, locations, contact numbers and other features of the ads. The data was obtained by scrapping ads from sites like Backpage.com and analyzed using different SAS tools like SAS Enterprise Guide, SAS Viya and SAS Enterprise Miner for text, network and exploratory analysis of the data.


RF-069 : The Advent of Renewable Energy
Sai Teja Sagi, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

Renewable energy accounted for 12.2% of total primary energy consumption and 14.9% of the domestically produced electricity in the United States in 2016. The development of renewable energy and energy efficiency marked "a new era of energy exploration" in the United States (US) according to former President Barack Obama. This research focuses on how the usage of renewable and non-renewable energy has changed over the past 25 years in the US and identifies potential correlations in usage patterns. Secondly, the impact of economic and geopolitical factors is investigated. Having a clear picture of energy usage by the world nations can help make effective policies for renewable energy adoption by different world countries. All kinds of audience ranging from students to scientists will get an understanding of the energy usage in the world. The dataset is obtained from the United Nations website and SAS Enterprise Guide was used for data preparation. Statistical methods like time series analysis and trend line studies are used to make the visualizations meaningful. Usage patterns of non-renewable energy are shown for the last 25 years from 1990 to 2014 and compared to renewable energy for the same time-period. We can see the exponential increase in energy production from solar and wind and a decrease in the usage of charcoal and brown coal. The usage comparisons between countries like USA, Germany, and China are made to assess which countries have been influential in fueling this trend. We can see from the analysis that the USA has significant growth rates in solar and wind energy production albeit its overall capacity is lower.


RF-077 : Utilizing Macros to Create Patient Site Matching via Zip-Code Radiuses
Kathryn Schurr, Quest Diagnostics

Determining how to match patients to clinical trial sites is a messy problem. This problem includes many different facets not limited to: inclusion criteria, exclusion criteria, and proximity to clinical trial sites. By creating a site matching program in SAS®, users are able to take patient lists and can easily determine whether or not a patient falls within a certain proximity of the site(s) of interest. This macro takes into account the Zip Code® location of the proposed site and the Zip Code location of the patient. It then is able to compute whether or not that patient should be assigned to any particular site. This paper builds upon an already existing macro that calculates the Zip Codes within a certain mile radius of a singular site location.


RF-079 : US Airline Passenger Satisfaction using SAS Enterprise Miner
Harish Reddy Patlolla, Oklahoma State University
Miriam Mcgaugh, Oklahoma State University

In the past 20 years, the aviation industry has grown rapidly. This growth of the industry provides opportunities as well as challenges. While the opportunities arise because of increasing demand, rival airlines pose a threat to long established companies. Apart from optimizing pricing, have you ever wondered what airlines do to overcome these threats? Passenger Satisfaction. Unhappy passengers mean fewer customers and less revenue. Therefore, it is important that passengers have a rich experience every time they travel. A satisfaction survey from 259,760 passengers, which contains a combination of categorical and continuous variables, was used for this study in an attempt to predict customer satisfaction based on variables that are easily obtained for airlines. Predictive models were built using decision trees and logistic regression. This study sought to not only explain the important factors, which impact passenger satisfaction in the US Airline industry, but also to examine change in those factors across different age groups. SAS® Enterprise Miner" and Tableau have been used for predictive modelling and exploratory analysis, respectively. The decision tree model was able to predict customer satisfaction with 86% accuracy indicating the in-flight entertainment, seat comfort and ease of online booking were some of the most important variables.


RF-100 : Surviving Survival Analysis 101: Making the Likelihood Ratio Test Easier Using a Macro
Katelyn Ware, Grand Valley State University
Rachel Baxter, Grand Valley State University

The likelihood ratio test is a commonly used hypothesis test to examine if a nested model is a better fit than a full model. In survival analysis, the likelihood ratio test is a useful tool when deciding if interaction terms are needed in a Stratified Cox Proportional Hazards model. When a certain covariate does not meet the Proportional Hazard assumption in survival data, it can still be included in the Cox Proportional Hazard model by stratifying by it. However, one must decide if the slopes are necessarily different across strata or not. The -2LogL values are obtained by running the full and reduced models with PROC LIFEREG and one can then manually calculate the p-value for a chi-square likelihood ratio test, but there is no automated option available. The following paper describes a macro that does automate this test for the user.


RF-105 : Comparing Dates without an Array
Laurie Smith, Cincinnati Children's Hospital Medical Center

This macro with help a beginner or better SAS v9.4 user compare a subject's dates in separate observations against each other without using an array. Proc sql along with a do loop allows a user to compare and retain dates that satisfy a defined condition, creating a final dataset with those desired dates.


RF-108 : What Not to Do in a Program Used with %include
Stephanie Thompson, Datamum

This Rapid Fire session will provide a fast look at what can make a program you list on a %include statement not work. Based on a frustrating real-world scenario.


RF-109 : 10 Cool Things You Can Do in a DATA STEP
Stephanie Thompson, Datamum

A look at some interesting things you can do in a DATA STEP that would be harder to do in other procedures or that are just plain interesting. Different aspects of utilizing the PDV, joins, editing a file without bringing it in to SAS, automatic variables, and a bit of what to do with DATA _NULL_ are some of the highlights. See the possibilities in the DATA STEP!


SAS 101 Plus

SP-002 : SAS® Macro Programming Tips and Techniques
Kirk Paul Lafler, Software Intelligence Corporation

The SAS® Macro Language is a powerful tool for extending the capabilities of the SAS System. Numerous tips, tricks and programming techniques related to the construction of effective macros are demonstrated. Topics include how to process statements containing macros; replace text strings with macro variables; generate SAS code using macros; manipulate macro variable values with macro functions; interface the macro language with the DATA step and SQL procedure; store and reuse macros; construct macros consisting of positional and keyword parameters; troubleshoot and debug macros; and develop efficient and portable macro language code.


SP-005 : SAS® Performance Tuning Techniques
Kirk Paul Lafler, Software Intelligence Corporation

The Base-SAS® software provides users with many powerful techniques for accessing, manipulating, analyzing, and processing data and results. With the availability of so many language features and the size of data sources, application developers, programmers and end-users can benefit from a set of guidelines for efficient use of the SAS software. Topics include a number of performance tuning techniques that can be applied to code and applications to conserve CPU, I/O, data storage, and memory resources while performing tasks more efficiently when sorting, grouping, merging (or joining), summarizing, transforming, and processing data.


SP-012 : PROC FORMAT with HTML - for useful Drill Down output in Web and/or Excel
Zeke Torres, RedMane Technology

This paper and code brings together output from SAS and common SAS features like: Proc Tabulate or Proc Summary, Proc Format, ODS HTML/EXCEL and gives the end user an OLAP like set of reports. This is a simple example that brings together those elements in a useful way for a SAS programmer to share output with someone else who isn't familiar with SAS but requires a way to drill down into the data, report and results to investigate further and obtain more answers.


SP-018 : Utilizing SAS Macros, Do Loops and ODS to produce automated production quality individual profiles
George Vineyard, St Louis College of Pharmacy

Combining the features of SAS Macros, Macro Arrays, Do Loops and ODS enable the researcher to expand their portfolio from data analysis to graphical designer. This paper will demonstrate how to combine these elements found in Base SAS and utilize them to build automated reports that contain all the bells and whistles normally only found in standalone versions.


SP-026 : Logistic Regression, Basics and Beyond
Bruce Lund, Independent Consultant

This paper presents light theory, supported by simulations, as well as practical suggestions for developing binary logistic regression models. Topics include: Firth method versus usual maximum likelihood method, screening, binning, transforming predictors, identification of multicollinearity, oversampling for rare events, predictor selection methods using PROC LOGISTIC, HPLOGISTIC, HPGENSELECT, and measures of fit and predictive accuracy. Base SAS and SAS/STAT Immediate User of SAS with some exposure to logistic regression


SP-031 : User-Defined Multithreading with the SAS® DS2 Procedure: Performance Testing DS2 Against Functionally Equivalent DATA Steps
Troy Hughes, Datmesis Analytics

The Data Step 2 (DS2) procedure represents the first opportunity that developers have had to build custom, multithreaded processes in Base SAS®. Multithreaded processing debuted in SAS 9, when built-in procedures such as SORT, SQL, and MEANS were threaded to reduce runtime. Despite this advancement, and in contrast with languages such as Java and Python, SAS 9 still did not provide developers the ability to create custom, multithreaded processes. This limitation was overcome in SAS 9.4 with the introduction of the DS2 procedure-a threaded, object-oriented version of the DATA step. However, because DS2 relies on methods and packages (neither of which have been previously available in Base SAS), both DS2 instruction and literature has predominantly fixated on these object-oriented aspects rather than DS2 multithreading. This text is the first to focus solely on DS2 multithreading and the performance advantages thereof. Common DATA step tasks such as data cleaning, transformation, and analysis are demonstrated, after which functionally equivalent DS2 code is introduced. Each paired example concludes with performance metrics that inarguably demonstrate faster runtimes with the DS2 language-even on a stand-alone laptop. All examples can be run in Base SAS and do not require in-database processing or the purchase of the DS2 Code Accelerator or other optional SAS components.


SP-043 : Look up not down: Advanced Table Lookup Techinques in BASE SAS
Jayanth Iyengar, Data Systems Consultants LLC
Josh Horstman, Nested Loop Consulting

One of the most common data manipulation tasks SAS programmers perform is combining tables through table lookups. In the SAS programmer's toolkit many constructs are available for performing table lookups. Traditional methods for performing table lookups include conditional logic, match-merging and SQL joins. In this paper we concentrate on advanced table lookup methods such as formats, multiple SET statements, and HASH objects. We conceptually examine what advantages they provide the SAS programmer over basic methods. We also discuss and assess performance and efficiency considerations through practical examples.


SP-052 : Fifteen Functions to Supercharge Your SAS® Code
Josh Horstman, Nested Loop Consulting

The number of functions included in SAS® software has exploded in recent versions, but many of the most amazing and useful functions remain relatively unknown. This paper will discuss such functions and provide examples of their use. Both new and experienced SAS programmers should find something new to add to their toolboxes.


SP-053 : Using Macro Variable Lists to Create Dynamic Data-Driven Programs
Josh Horstman, Nested Loop Consulting

The SAS Macro Facility is an amazing tool for creating dynamic, flexible, reusable programs that can automatically adapt to change. In this paper, you'll see how macro variable lists provide a simple but powerful mechanism for creating data-driven programming logic. Don't hard-code data values into your programs. Eliminate data dependencies forever and let the macro facility write your SAS code for you!


SP-057 : Using ODS Trace (DOM), Procedural Output and ODS Output Objects to Create the Output of Your Dreams
Louise Hadden, Abt Associates Inc.

SAS® procedures can convey an enormous amount of information - sometimes more information than is needed. The ODS TRACE and ODS TRACE DOM statements allow us to discover what output objects and underlying style information is created by each invocation of a SAS procedure and procedural options. By manipulating procedural output and ODS output objects, we can pick and choose just the information we want to see and report upon. We can then harness the power of SAS reporting procedures and various ODS destinations to present the information accurately and attractively. This presentation is suitable for all levels of proficiency. Examples shown were run using SAS 9.4 Maintenance Release 5 on a Windows Server platform.


SP-065 : Quick, Call the "FUZZ": Using Fuzzy Logic
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

SAS® practitioners are frequently called upon to do a comparison of data between two different data sets and find that the values in synomonous fields do not line up exactly. A second quandary occurs when there is one data source to search for particular values, but those values are contained in character fields in which the values can be represented in myriad different ways. This paper discusses robust, if not warm and fuzzy, techniques for comparing data between and selecting data in SAS data sets in not so ideal conditions.


SP-072 : Powerful SAS® Output Delivery with ODS EXCEL
LeRoy Bessler, Bessler Consulting and Research

A common destination for results prepared with SAS is often an Excel workbook. Everyone already has Excel and knows how to use it, to reformat or further explore their results however they wish. ODS Excel enables a SAS programmer to create highly formatted reports, tabular or graphic, or a combination of both, that can be opened and used with Excel. You can turn on customization/formatting features in SAS that would be possible manually inside Excel, to deliver an already finished product to the viewer of the report. The ODS Excel capability does not require Excel to be installed on the machine that creates ODS Excel output. You can use ODS EXCEL running SAS on MVS, UNIX, Linux, or Windows. No prior knowledge is assumed. ODS Excel output requires Microsoft Excel 2010 or later.


SP-093 : Urge to Merge? Maybe You Should Update Instead.
Ben Cochran, The Bedford Group, Inc.

Many SAS users need the functionality of the UPDATE statement, but they just don't know about the built-in features inherent in this statement, so instead, they try to perform an updating operation with the MERGE statement. There is some very powerful built-in logic incorporated within the UPDATE statement that can make this operation a very simple programming endeavor. This paper explores some of the features of the UPDATE statement and why you would want to use it instead of the MERGE statement.


SP-110 : Creating In-line Style Macro Functions
Arthur Li, City of Hope

Macro functions that are used in our program are defined by the macro facility. By providing values for the function parameters, the macro function generates a result. We can insert the result directly into a macro statement in our program. Programmers seldom know that we can create a user-defined macro function as well. This paper will focus on the methods of creating an in-line style macro function via various examples.


e-Poster

PO-023 : Creating a True LSF Batch Job Submission Capability on SAS EG in a SAS Grid
Derek Grittmann, General Dynamics Federal Civilian Health
Adam Hendricks, General Dynamics Federal Civilian Health

See attachment


PO-028 : Generating SAS Datasets from ASCII Files Using a Crosswalk
Jose Centeno, NORC at the University of Chicago

In real-life applications, it is common to have a corresponding crosswalk in order to read raw files into SAS, especially if you're dealing with numerous output files and hundreds of variables with particular formats attached. In many cases, you will find yourself with the task of removing, adding, and/or updating variables which could become challenging or tedious. In this paper we will describe how with the help of a few macros we can reduce this effort as well as greatly decrease the number of lines of code in your main program. This program will be easier to maintain, less error-prone and will be easily deployed for other projects. This paper assumes a basic understanding of SAS data step programming, and a basic understanding of SAS Macros.


PO-030 : Badge in Batch with Honeybadger: Generating Conference Badges with Quick Response (QR) Codes Containing Virtual Contact Cards (vCards) for Automatic Smart Phone Contact List Upload
Troy Hughes, Datmesis Analytics

Quick Response (QR) codes are widely used to encode information such as uniform record locators (URLs) for websites, flight passenger data on airline tickets, attendee information on concert tickets, or product information that can appear on product packaging. The proliferation of QR codes is due in part to the broad dissemination of smart phones and the accessibility of free QR code scanning applications. With the ease of self-scanning QR codes has come another common QR code usage-the identification conference attendees. Conference badges, emblazoned with an attendee-specific QR code, can communicate attendee contact and other personal information to other conference goers, including organizers, vendors, potential customers or employers, and other attendees. Unfortunately, some conference organizers choose not to include QR codes on conference badges because of the complexity and price involved in producing and including the QR codes. To that end, this text introduces flexible Base SAS® software that overcomes this limitation by dynamically creating attendee QR codes from a data set containing contact and other information. Furthermore, the flexible, data-driven approach creates attendee badges that can be maintained and printed by conference organizers. When a badge QR code is scanned by a fellow conference goer, the attendee's personal information-including name, job title, company, phone number, email address, city, state, website, and biographical statement-is uploaded into a variant call format (VCF) file (or vCard) that can be uploaded automatically into a smart phone's contact list. Attendees are able to select what personal information is contained within their QR code and conference organizers are able to customize and configure badge format and content through an external cascading style sheet (CSS) file that dynamically alters badges without the necessity to modify the underlying code. This end-to-end system offers conference organizers potential cost savings of hundreds of dollars!


PO-040 : Configuration and Usage of SAS®Py on Grid 9.4
Venkateswarlu Toluchuri, Tech Lead SAS Administrator

With the introduction of the official SAS®Py package, it is now trivial to incorporate SAS® into new workflows leveraging Jupyter Notebook coding and publication environment, along with the broader Python data science ecosystem that comes with it. This paper and presentation provides an overview of Install and configuration of SAS®Py with SAS® Grid environment using IOM method and setup dedicated Grid queues to allocate proper work load balance. It also Included some basic general principles of passing data between Python's Data Frames and SAS data sets, as well as the unique advantages SAS brings to the Notebook workspace and Python ecosystem. A number of new possibilities have emerged for integrating SAS® into tools that are widely used in the Python corner of data science. Yet given the number of potentially overlapping components involved (Jupyter, SAS®Py, SWAT, Pipefitter), there is potential for confusion regarding the practical Install and setup SAS®Py with SAS® 9.4 Grid and have dedicated queue setup for Python sessions with SAS products and platforms. This paper aims to provide Install and configuration of SAS®Py on a client machine and to integrate with SAS®9.4 Grid environment that are likely to have the broadest immediate audience and benefit. These are primarily Jupyter notebooks and SAS®Py, which together offer an excellent starting point towards taking advantage of many benefits that Python integration can bring to SAS® workflows and projects building SAS® Analytics models.


PO-045 : Have Your SAS Program and Schedule It Too!
Mario Tejada, NORC at the University of Chicago

In some SAS environments, it is common to use the Windows Task Scheduler to launch production jobs on a set schedule. In order to create a basic task, one needs to manually open Task Scheduler and enter the desired parameters for the job. This paper will explore the possibility of using SAS to drive the scheduling process. Instead of going into Task Scheduler, the programmer can use SAS to set the parameters of a job and programmatically register that task in Windows using PowerShell. This paper assumes a basic understanding of SAS data step programming, SAS Macros and administrator-level access to run Windows Operating system (PowerShell) commands and create scheduled tasks.


PO-070 : "Levels Do Count - A New Dimension To The Interaction Effect In 3-way Factorial Analysis " - Varsha Ganagalla, Dr Daniel Adrian, Grand Valley State University.
Varsha Ganagalla, Grand Valley State University
Daniel Adrian, Grand Valley State Univeristy

The standard output from the analysis of factorial experiments with 3 fixed factors A, B, and C includes the sum of squares and associated tests for the factorial effects, including the main effects A, B, and C, the 2-factor interactions AB, AC, and BC, and the 3-factor interaction ABC. However, if ABC significant, analysis often proceeds by testing the 2-factor interactions at each of the third factor, which is not part of the standard output. Most of the heavy lifting can be accomplished with 2-way ANOVAs and BY statements, but the resulting tests need to be adjusted to incorporate the mean squared error and error degrees of freedom from the entire dataset. This can be extremely cumbersome if attempted manually. We present macro code using SAS which automates this analysis for any 3-factor dataset in the interest of reducing time and programming errors.


PO-071 : Frequency matching case-control techniques: an epidemiological perspective
Hai Nguyen, UIC-School of Public Health

(please refer my abstract from a Word file from in the Submission File, including a Hierarchy Diagram and a table of results)


PO-082 : Oh, There's No Place Like SAS ODS Graphics for the Holidays!
Ted Conway, Self

Already a SAS ODS Graphics user at work, the author used the (free!) SAS University Edition software he'd recently downloaded and installed on his home laptop to knock out a connect-the-dots Tom Turkey with PROC SGPLOT to commemorate Thanksgiving 2015. And so began an ongoing series of "Fun with SAS ODS Graphics" posts on the SAS Support Communities and Twitter that celebrated major holidays and other events. While creating these admittedly frivolous charts from the comfort of his easy chair, the author learned some useful techniques for creating serious data vizzes, which will be shared in this e-Poster and the accompanying paper.


PO-097 : Exploring Wine Reviews: How Language and Word Use Varies in Wine Reviews
Abigail Zysk, Grand Valley State University
Kylie Springer, Grand Valley State University

With thousands of varieties of wine, wine descriptions are diverse and unique to the individual describing the wine. A dataset that included the wine variety, reviewer, and wine descriptor/flavor words was used to explore the frequency of word use within certain varieties of wine, for individual reviewers, and for the combination of variety and reviewer. By examining the word usage for different varieties of wine for the top reviewers, we saw that the most used wine descriptor words were not exclusive to varieties of wine but dependent on the wine reviewer. Roger Voss, who had the largest amount of reviews, used the word 'rich' when using a wine descriptor word 13.37% of the time. This was reflected across different varieties of wines. When reviewing Bordeaux-Style Red Blends he used the word 'rich' 13.57% of the time when using any wine descriptor words, 13.41% for Chardonnay, and 16.09% for Malbec. When looking at word usage for a single variety of wine, we concluded that the words favored by reviewers could influence the results. When exploring the most used wine descriptor words for Bordeaux-Style Red Blends and Rosé wine the word 'rich' was one of the top words for both wines, which could be due to Roger frequently using the word. Based on these results, when selecting a bottle of wine it would be helpful to look at reviews from a multitude of different wine reviewers to get an accurate description of the wine.


PO-098 : Profiling hospital length of stay using the mode
Anne Cain-Nielsen, University of Michigan
Scott Regenbogen, University of Michigan

Measures of hospital length of stay (LOS) are often used to compare hospital performance and are frequently used in health services research applications. While often appropriate, common metrics of central tendency used for length of stay profiling, such as the mean or median, can be sensitive to outlying values or may not identify representative patterns of care. In an analysis using national Medicare data, we considered three possible profiling measures: hospital mean, median, and mode LOS. We wished to profile the intended or 'typical' postoperative LOS for beneficiaries who underwent total hip replacement (THR, 231,774 patients in 1,831 hospitals), coronary artery bypass grafting (CABG, 218,940 patients in 1,056 hospitals), or colectomy (189,229 patients in 1,876 hospitals). For all three procedures, mean LOS was the metric most sensitive to outlying values (e.g. longer lengths of stay associated with post-operative complications). For CABG and colectomy, median LOS was also longer than the most typical (mode) postoperative care pathway. We will illustrate how hospital mode length of stay can be easily calculated using SAS 9.4, and demonstrate that the mode can be an appropriate metric for profiling hospital length of stay for certain analytic objectives. This presentation would be relevant for any level of SAS user whose work involves profiling hospital length of stay (e.g. health services research, hospital quality improvement, medicine).


PO-099 : Seeing the Things We Love with SAS
Laurie Smith, Cincinnati Children's Hospital Medical Center

It's fun to use SAS (even Base SAS) to see those things we love in a different light. So, I am a huge hip hop fan and was interested in using SAS to view the music I listen to differently, by creating a visual (graphical) comparison of the lyrical content versus how well the track did on the Billboard Hot 100 Charts for an artist. Obtaining a list of the singles listing by rank, using an API to import the lyrics from Musixmatch with SAS, followed by classifying the lyrics into different categories. Not knowing if there is a correlation, It's interesting to research and this is definitely not the typical use for SAS, but It's a fun way to step out!