Proceedings

Get Updates

Join our e-mail list:

MWSUG 2018 Paper Presentations

Paper presentations are the heart of a SAS users group meeting. MWSUG 2018 will feature dozens of paper presentations organized into several academic sections covering a variety of topics and experience levels.

Note: Content and schedule are subject to change. Last updated 12-Jul-2018.



Business Leadership

Paper No. Author(s) Paper Title (click for abstract)
BL-003 Kirk Paul Lafler Differentiate Yourself
BL-028 Brandy Sinco Computer Karma: Non-Monetary Benefits from Statistical And Information Technology Volunteer Work
BL-043 Chuck Kincaid How to Succeed in Consulting
BL-050 Nancy Brucken Customized Project Tracking with SAS and Jira
BL-064 Josh Horstman So You Want To Be An Independent Consultant
BL-090 Sridevi Loya Analyzing YouTube comments on gun violence using SAS® Viya and SAS® Text Miner
BL-101 Paul Segal Analytics in the Cloud: Beware of "hidden" costs
BL-103 Paul Segal Accelerate Your Analytics with SAS® and Teradata Using Disparate Data Sources
BL-104 Peter Eberhardt What is Leadership?
BL-112 Ying Shi Modified Response Evaluation Criteria in Solid Tumors for Immuno-Oncology Clinical Trials
BL-122 Chuck Kincaid How to HOW: Hands-on-Workshops Made Easy
BL-127 Troy Hughes From Readability to Responsible Risk Management: Facilitating the Automatic Identification and Aggregation of Software Technical Debt within an Organization Through Standardized Commenting in SAS® Program Files and SAS Enterprise Guide Project Files
BL-144 Tho Nguyen Become a Data-Driven Organization with People, Process and Technology
BL-146 Amy Peters Comparing SAS® Viya® and SAS® 9.4 Capabilities: A Tale of Two SAS Platform Engines
BL-147 David Corliss Data For Good as a Community Service Project at Work


Hands-on Workshops

Paper No. Author(s) Paper Title (click for abstract)
HW-009 Kirk Paul Lafler A Hands-on Introduction to SAS® Metadata DICTIONARY Tables and SASHELP Views
HW-033 Richann Watson
& Kriss Harris
Animate Your Data!
HW-055 Jayanth Iyengar Understanding Administrative Healthcare Data sets using SAS programming tools
HW-065 Josh Horstman Getting Started with the SGPLOT Procedure
HW-098 Kent Phelps
& Ronda Phelps
The Joinless Join ~ The Impossible Dream Come True Using SAS® Enterprise Guide® and Base SAS® PROC SQL and DATA Step; Expand the Power of SAS® Enterprise Guide® and Base SAS® in New Ways
HW-099 Peter Eberhardt The Baker Street Irregulars Investigate: Discoveries Using Perl Regular Expressions and SAS®


Health Sciences

Paper No. Author(s) Paper Title (click for abstract)
HS-027 Brandy Sinco et al. Tips, Tricks, and Traps on Longitudinal Data Analysis with Discrete and Continuous Times
HS-036 Michael Battaglia et al. How to Navigate in a Maze of the Raking Macro with Advanced Weight Trimming
HS-038 Michael G. Wilson Assessing Model Adequacy in Proportional Hazards Regression
HS-039 Xiaoting Wu et al. Using SAS® to Validate Clinical Prediction Models
HS-045 Xi Chen et al. Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
HS-053 Jamie Kammer et al. Creating suicide attempt/intentional self-harm episodes using administrative billing data
HS-071 Roderick Jones
& Lynn (Xiaohong) Liu
Leveraging SHEWHART Procedure Options to Monitor and Evaluate Improvements in Healthcare
HS-073 Laurie Smith A Macro to Import Subject Data Saved in a Location with Separate Subfolders for each Subject
HS-074 Lynn (Xiaohong) Liu
& Roderick Jones
Use of SAS Macros to automate the Production of Statistical Process Control Charts
HS-079 Mohsen Asghari et al. SAS Text-mining tools applied to Medical Information Assessment: ICD-10 code retrieval
HS-080 Sunil Kumar A Data Mining Approach to Predict Dental Adverse Events
HS-083 Brian Mosier et al. A Macro to Calculate Sample Size for Studies Using the Proportional Time Assumption
HS-088 Jennifer Scodes Baseline Mean Centering for Analysis of Covariance (ANCOVA) Method of Randomized Controlled Trial Data Analysis
HS-094 David Corliss Genocide Modeling - Historical Risk Factors and Odds Ratios
HS-118 Adams Kusi Appiah Bootstrap Linear Mixed-Effects Models using SAS Procedures
HS-119 Xuelin Li et al. Automated Transfer of a Sea of SAS® Programs between Data Transfers
HS-123 Pratap Kunwar A Macro to Add SDTM Supplemental Domain to Standard Domain
HS-132 Troy Hughes Toward Adoption of Agile Software Development in Clinical Trials
HS-136 Rishabh Mishra Addressing Opioid Crisis using Data Science
HS-142 Zeqing Lu et al. Spotfire Clinical visualizations from SAS and R
HS-143 Hillary Graham et al. AutoPDF : an R Package to Output Vector Graphics


SAS 101 Plus

Paper No. Author(s) Paper Title (click for abstract)
SP-002 Kirk Paul Lafler Making Your SAS® Output, Results, Reports, Charts and Spreadsheets More Meaningful with Color
SP-048 John Schmitz Using Multilabel Formats with PROC SUMMARY to Generate Report Data with Overlapping Time Segments
SP-049 Jack Shoemaker Data-driven Data Analysis
SP-057 Ting Sa A Macro that Can Get the Geo Coding Information from the Google Map API
SP-061 Donna Levy
& Nancy Brucken
Seeing the Forest for the Trees: Part Deux of Defensive Coding by Example
SP-062 Veronica Renauldo Efficiency Programming with Macro Variable Arrays
SP-063 Josh Horstman Dating for SAS Programmers
SP-066 Josh Horstman Merge with Caution: How to Avoid Common Problems when Combining SAS Datasets
SP-069 Larry Riggen What's the Difference? Using the PROC COMPARE to find out.
SP-075 Margaret Kline
& Daniel Muzyka
From Clicking to Coding: Using ODS Graphics Designer as a Tool to Learn Graph Template Language
SP-076 Jayanth Iyengar Tips, Traps, and Techniques in BASE SAS for vertically combining SAS data sets
SP-078 Jacob Keeley
& Carl Nord
Improving Plots Using XAXISTABLE and YAXISTABLE
SP-084 Lindsey Whiting
& Joey Kaiser
Automating SAS Job Streams With the Power of VB Script
SP-100 Peter Eberhardt Using SASv9.cfg, autoexec.sas, SAS® Registry, and Options to Set Up Base SAS®
SP-106 Kaiqing Fan SAS Techniques to Handle Big Files And Reduce Execution times
SP-116 Louise Hadden Order, Order! Four Ways to Reorder Your Variables, Ranked by Elegance and Efficiency
SP-139 Warren Kuhfeld Keeping Up to Date with ODS Graphics


SAS 301 Beyond the Basics

Paper No. Author(s) Paper Title (click for abstract)
SB-010 Kirk Paul Lafler
& Stephen Sloan
A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
SB-019 Kirk Paul Lafler Visual Storytelling - The Art of Communicating Information with Graphics
SB-021 Kaiqing Fan How to Assembly Line Create Graphic Images Using PROC TEMPLATE in SAS Enterprise Guide? Part I
SB-034 Barbara Okerson Backsplash patterns for your world: A look at SAS OpenStreetMap (OSM) tile servers
SB-040 Yurong Dai
& Jiangang Jameson Cai
Conversion of CDISC specifications to CDISC data - specifications driven SAS programming for CDISC data mapping
SB-052 John Schmitz Show Me That? Using SAS VIYA, Visual Analytics and Free ESRI Maps to Show Geographic Data
SB-059 Mark Keintz Finding National Best Bid and Best Offer - Quote by Quote
SB-060 Mark Keintz From Stocks to Flows: Using SAS® HASH objects for FIFO, LIFO, and other FO's
SB-081 Scott Koval Picture Perfect: An Introduction to the Image Action Set available with SAS® Viya® Programming
SB-089 Manideep Mellachervu
& Anvesh Reddy Minukuri
Analyzing Amazon's Customer Reviews using SAS® Text Miner for Devising Successful Product Launch Strategies
SB-093 Deanna Schreiber-Gregory
& Karlen Bader
Quality Control for Big Data: How to Utilize High Performance Binning Techniques
SB-102 Paul Segal Speed up your Data Processing with SAS Code Accelerator.
SB-114 Louise Hadden Wow! You Did That Map With SAS®?! Round II
SB-140 Jane Eslinger Square Peg, Square Hole-Getting Tables to Fit on Slides in the ODS Destination for PowerPoint
SB-141 Warren Kuhfeld Advanced ODS Graphics Examples
SB-145 Kaushal Chaudhary
& Dhruba Ghimire
Perl Regular Expression - The Power to Know the PERL in Your Data


SAS Super Demos

Paper No. Author(s) Paper Title (click for abstract)
SD-149 Danny Modlin Creating a Custom Task in SAS Studio
SD-150 Brett Wujek Executing Open Source Code in Machine Learning Pipelines of SAS Visual Data Mining and Machine Learning
SD-151 Brett Wujek Tune In to Model Tuning
SD-152 Warren Kuhfeld Highly Customized Graphs Using ODS Graphics
SD-153 Warren Kuhfeld Heat Maps: Graphically Displaying Big Data and Small Tables
SD-154 Jane Eslinger What's New in the ODS Excel Destination
SD-155 Jane Eslinger Creating Pivot tables using ODS Markup
SD-156 Cynthia Zender SAS 9.4 ODS in a Nutshell
SD-157 Cynthia Zender Accessibility with ODS Output
SD-158 Amy Peters The Future of SAS Enterprise Guide and SAS Studio


Statistics / Advanced Analytics

Paper No. Author(s) Paper Title (click for abstract)
AA-029 Matthew Bates Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables
AA-030 Michael Grierson Confounded? This example shows how to use SAS chi-square tests, correlations and logistic regression to unconfound a result.
AA-031 Bruce Lund Screening, Transforming, and Fitting Predictors for Cumulative Logit Model
AA-035 Ming-Long Lam Monitoring the Relevance of Predictors for a Model Over Time
AA-041 Peter Flom Alternative methods of regression when OLS is not right.
AA-042 Peter Flom An introduction to classification and regression trees with PROC HPSPLIT.
AA-047 Bruce Lund Propensity Scores and Causal Inference for (and by) a Beginner
AA-077 Kwideok Han Estimating the Impacts of the EDA Public Works Program on County Employments Using SAS/ETS 14.1.
AA-091 Deanna Schreiber-Gregory
& Karlen Bader
Logistic and Linear Regression Assumptions: Violation Recognition and Control
AA-092 Deanna Schreiber-Gregory
& Karlen Bader
Regularization Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets
AA-108 Pat Berglund Using SAS® for Multiple Imputation and Analysis of Longitudinal Data
AA-109 Palash Sharma Application of heavy-tailed distributions using PROC IML, NLMIXED and SEVERITY
AA-117 Yuting Tian An Introduction to the process of improving a neural network
AA-120 Min Chen Handling Missing Data in Exploratory Factor Analysis Using SAS
AA-121 Scott Koval How to Score Big with SAS Solutions: Various Ways to Score New Data with Trained Models
AA-137 Danny Modlin Getting Started with Bayesian Analytics
AA-138 Brett Wujek Introduction to Machine Learning in SAS


e-Posters

Paper No. Author(s) Paper Title (click for abstract)
PO-032 Richann Watson
& Kriss Harris
Great Time to Learn GTL
PO-044 Venkateswarlu Toluchuri Self-service utility to List and Terminate SAS grid jobs
PO-051 Nancy Brucken
& Jared Slain
An Update on the CS Standard Analyses and Code Sharing Working Group
PO-054 Robert Downer Using your FREQ effectively: Displays to Decipher Proportional Odds in Ordinal Regression
PO-068 Parag Vilas Sasturkar Factors Responsible for Students' Enrollment at Oklahoma State University
PO-107 Kaiqing Fan An Easy Way to Know When to Buy and When to Sell your Stocks
PO-111 Guangtao Gao How to Avoid Possible Tricks When Using DATA STEP MERGE Instead of PROC SQL JOIN
PO-115 Louise Hadden Purrfectly Fabulous Feline Functions




Abstracts

Business Leadership

BL-003 : Differentiate Yourself
Kirk Paul Lafler, Software Intelligence Corporation

Today's job and employment marketplace is highly competitive. As a result, SAS® professionals should do everything they can to differentiate and prepare themselves for the global marketplace by acquiring and enhancing their technical and soft skills. Topics include describing how SAS professionals should assess and enhance their existing skills using an assortment of valuable, and "free", SAS-related content; become involved, volunteer and speak at in-house, local, regional and international SAS user group meetings and conferences; and publish blog posts, videos, articles, and PDF "white" papers to differentiate themselves from the competition.


BL-028 : Computer Karma: Non-Monetary Benefits from Statistical And Information Technology Volunteer Work
Brandy Sinco, University of Michigan

Volunteer statistical and information technology work offers many rewards other than direct monetary payment. Many of these rewards are at least as valuable as money. First, the volunteer can enhance one's skills, which will be useful in a present or future job. Examples with SAS range from simple techniques, such as computing intraclass correlation coefficients with Proc Mixed to complex analysis methods, such as learning how to bootstrap indirect effects with Proc CALIS. Other non-SAS examples are learning how to debug error messages about audio links in web pages and expanding knowledge of computer trouble shooting. In order to be successful as a data scientist, analysts must be continually willing to expand their skills through continuing education. Volunteer work enhances and supplements formal classes and workshops, by providing learning opportunities without the rigid deadlines in a paid job. Second, the volunteer can network with people who may later assist in finding a job. Real life examples will be provided of people who found data analysis and information technology jobs, by being involved in volunteer projects. Third, the volunteer may connect with people who can assist with over-employment by serving as referrals for unwanted overtime projects. Again, real life stories will be illustrated.


BL-043 : How to Succeed in Consulting
Chuck Kincaid, Experis Business Analytics

Maybe you are an awesome programmer working on a company's internal consulting team, but you have a hard time getting work done by the deadline. Maybe you are a strong SAS developer who does independent consulting, but clients get upset when changes they ask for cost them more money. Just because you're good in the technical skills doesn't mean that you can succeed as a consultant. This presentation will give you tips on how to do just that. With a combination of project management and consulting skills you can go much farther, whether it's doing internal consulting, independent consulting or working for a consulting company. This presentation will be good for people who want to do better at managing the world outside of their code.


BL-050 : Customized Project Tracking with SAS and Jira
Nancy Brucken, InVentiv Health Clinical

As programmers and statisticians, most of us are far better at programming tasks than we are at project management. Jira is a powerful and inexpensive commercially-available application designed to help programming teams track their progress on projects. It is built on top of a PostgreSQL database, which can be queried from SAS using SAS/ACCESS to ODBC to generate a variety of custom reports.


BL-064 : So You Want To Be An Independent Consultant
Josh Horstman, Nested Loop Consulting

While many statisticians and programmers are content with a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. While this can be a tremendous benefit, there are many details to consider. This paper will provide an overview of consulting as a statistician or programmer. We'll discuss the advantages and disadvantages of consulting, getting started, finding work, operating your business, and various legal, financial, and logistical issues.


BL-090 : Analyzing YouTube comments on gun violence using SAS® Viya and SAS® Text Miner
Sridevi Loya, Student

Gun violence is a major cause of premature death in the U.S. "Guns kill more than 38,000 people and cause nearly 85,000 injuries each year" according to the American Public Health Association. Gun violence is a complex issue and it is spread throughout the country, which is why it is important to understand the demographics where this violence occurs and attempt to understand people's behaviour on such incidents. This paper will focus on studying the opinions and reforms of gun violence in schools of the United States. This research will include studying and understanding the views of people after the occurrence of Florida mass shooting. On February 14, 2018, a mass shooting occurred at Marjory Stoneman Douglas High School in Parkland, Florida. Seventeen people were killed and seventeen more were wounded. The data pertaining to similar incidents occurring in the last few years is pulled from Kaggle.com. Using this data, descriptive analytics would be carried out to understand various metrics and create the demographic impression using SAS Enterprise Guide and SAS Viya. Text analytics would be performed on the comments written by people on YouTube using SAS Text Miner and SAS Enterprise Miner. These comments were given by people after the incident occurred. This paper will be especially looking at two specific sites for textual data - CNN news and the ABC news site. This paper will benefit everyone, as the main aim of this paper is to come up with solutions by examining people's opinion and derive meaningful insights and viable recommendations to mitigate such acts.


BL-101 : Analytics in the Cloud: Beware of "hidden" costs
Paul Segal, Teradata

With the rush to the cloud, it is easy for the data scientist to be unaware of some of the "hidden" costs of using a cloud based infrastructure. These hidden costs can cause a large increase the the operational expenditure that gets billed to your department. In this talk we show you what those costs are, how easy it is to incur them, and how to avoid (or at least mitigate) them, using SAS in-database processing for Teradata. This talk will also include a live demonstration of the techniques outlined.


BL-103 : Accelerate Your Analytics with SAS® and Teradata Using Disparate Data Sources
Paul Segal, Teradata

Analytics today often involves working with multiple data types from multiple storage types, including traditional relational database management systems (RDBMSs) such as Teradata, Oracle, DB2, Microsoft SQL Server, and MySql, as well as file-system-type storage such as Apache Hadoop and Amazon Simple Storage Service (Amazon S3), as well as NoSQL sources such as MongoDB and Cassandra. Sourcing the data from a federated data layer brings its own share of issues, such as having to know all the details for every data platform (IP address, port numbers, logon details, data access mechanism, data query languages, and so on). Other drawbacks of having a federated data space are that often the data needs to be replicated and stored (using up valuable disc space), and you might no longer be able to leverage processes to speed up your analytics (such as in-database or on-platform processing). In this presentation, we present a solution that addresses all these issues. Teradata QueryGrid combines the most comprehensive in-database solution from SAS with the Teradata RDBMS. With Teradata QueryGrid, you can access data from a wide variety of data sources using a common language (SQL), abstracting away the connection details so that you don't need to know the gritty connection details, all while using the tremendous performance of SAS® running inside the Teradata database


BL-104 : What is Leadership?
Peter Eberhardt, Fernwood Consulting Group Inc

In this presentation we will talk about the nature of leadership; that is, what it takes to be a leader. We will also look at the difference between being a manager and being a leader. The discussion is industry agnostic. It is applicable to all levels of SAS users, but those just starting out on their careers will find it thought provoking.


BL-112 : Modified Response Evaluation Criteria in Solid Tumors for Immuno-Oncology Clinical Trials
Ying Shi, Eli Lilly and Company

Immunotherapeutic agents may produce antitumor effects by potentiating endogenous cancer-specific immune responses. The response patterns may extend beyond the typical time course and can manifest a clinical response after an initial cancer progression. Standard RECIST may not provide an accurate response assessment for Immuno-Oncology trials, but there is no established standard for analysis need across pharmaceutical industry. This paper will explain one way to modify RECIST 1.1 and related analysis need, together provide derivations of best overall response using SAS. No dependency of operation system or SAS version. Knowledge of oncology clinical trials is needed for audience.


BL-122 : How to HOW: Hands-on-Workshops Made Easy
Chuck Kincaid, Experis Business Analytics

Have you ever attended a Hands-on-Workshop and found it useful? Many people do! Being able to actually try out the things that you're learning is a wonderful way to learn. It's also a great way to teach. You can see if they people can apply what they're learning. Have you ever thought that it would be fun to teach other people in a hands on format? Maybe you weren't sure what it takes or how to approach the course. This presentation will help you with those questions and struggles. What to teach? How much to teach? How should I teach it? How is a Hands-on-Workshop different than lecture style? How much to put into PowerPoints? What if they ask me something I don't know? What if they have a computer problem? All those questions that you have will be answered in this presentation.


BL-127 : From Readability to Responsible Risk Management: Facilitating the Automatic Identification and Aggregation of Software Technical Debt within an Organization Through Standardized Commenting in SAS® Program Files and SAS Enterprise Guide Project Files
Troy Hughes, Datmesis Analytics

Software readability is greatly improved when programs include descriptive comments in a predictable, standardized format. Program headers that describe software requirements, author, creation date, versioning history, caveats, and other metadata are a common method to facilitate a greater understanding of software objectives, strengths, weaknesses, and prerequisites. Moreover, when program headers are standardized, they are not only more readable to developers but also to parsing algorithms that automatically extract metadata for analysis or archival. In addition to those included in program headers, comments throughout software can be parsed and extracted when constructed in a standardized format. This text introduces a standardized commenting methodology that enables both qualitative and quantitative comments to be parsed from SAS® software headers and body. A configuration file defines comment formatting and content and provides a flexible, scalable, reusable SAS macro-based solution. This text demonstrates one use case of this methodology in which software technical debt and risk are assessed via both qualitative (e.g., risk description, proposed risk resolution) and quantitative (e.g., risk severity, risk probability, likelihood of risk discovery, ease of risk mitigation) metadata and metrics included within SAS comments. The comment interpreter dynamically identifies and parses all SAS program files and SAS Enterprise Guide project files (including embedded SAS programs therein) within one or more folders to produce a comprehensive risk register for unlimited programs. This data-driven documentation, generated with push-button simplicity, enables SAS practitioners to better understand and make decisions about project and program risk and technical debt.


BL-144 : Become a Data-Driven Organization with People, Process and Technology
Tho Nguyen, Teradata

Data is a differentiator and an asset to make decisions. As an industry, we are data rich but knowledge poor because organizations are unable to make sense of all the data they collect. We are barely scratching surface when it comes to analyzing all of the data that we have. In addition, the ability to analyze the data has become much more complex and time consuming and companies may not have the right people, process or technology to do the job effectively and efficiently. As data volumes continue to grow, it is imperative to have the proper people, process and technology to become a data-driven organization.


BL-146 : Comparing SAS® Viya® and SAS® 9.4 Capabilities: A Tale of Two SAS Platform Engines
Amy Peters, SAS

SAS® Viya® extends the SAS® Platform in a number of ways and has opened the door for new SAS® software to take advantage of its capabilities. SAS® 9.4 continues to be a foundational component of the SAS Platform, not only providing the backbone for a product suite that has matured over the last forty years, but also delivering direct interoperability with the next generation analytics engine of SAS Viya. Learn about the core capabilities shared between SAS Viya and SAS 9.4, and about where they are unique. See how the capabilities complement each other in a common environment, and understand when it makes sense to choose between the two and when it makes sense to go with both. In addition to these core capabilities, see how the various SAS software product lines stack up in both, including analytics, visualization, and data management. Some products, like SAS(r) Visual Analytics, have one version aligned with SAS Viya and a different version with SAS 9.4. Other products, like SAS® Econometrics, leverage the in-memory, distributed processing of SAS Viya, while at the same time including SAS 9.4 functionality like Base SAS® and SAS/ETS® software. Still other products target one engine or the other. Learn which products are available on each, and see functional comparisons between the two. In general, gain a better understanding of the similarities and differences between these two engines behind the SAS Platform, and the ways in which products leverage them.


BL-147 : Data For Good as a Community Service Project at Work
David Corliss, Peace-Work

Many businesses large and small support volunteering within the community, often sponsoring volunteer work days where employees can attend a well-organized activity for a group doing good work in the community. Today, Data For Good volunteering an opportunity to supply a critical skill for charitable organizations making a difference. This presentation is designed to help business leaders set up a Data For Good project as community service and team building activity, with best practices for finding a good organization to support, recruiting participants, managing the volunteer work day, and sharing the story with the wider community. Descriptions of successful projects and how to do them include building a membership database, data dive events to develop recruiting models for community service groups, seasonal optimization of resources (e.g., food pantries), and others. Practical, proven processes are presented to make Data For Good your company's next community service project.


Hands-on Workshops

HW-009 : A Hands-on Introduction to SAS® Metadata DICTIONARY Tables and SASHELP Views
Kirk Paul Lafler, Software Intelligence Corporation

SAS® users can easily and quickly access metadata content with a number of read-only SAS data sets called DICTIONARY tables or their counterparts, SASHELP views. During a SAS session, information (known as metadata) is captured including SAS system options along with their default values, assigned librefs, table names, column names and attributes, formats, indexes, and more. This hands-on workshop introduces how metadata can be used as input into a SAS code generator or a SAS macro to produce the desired results, the application of specific DICTIONARY table and SASHELP view content, and an assortment of examples related to the creation of dynamic code.


HW-033 : Animate Your Data!
Richann Watson, DataRich Consulting
Kriss Harris, SAS Specialists Ltd

When reporting your safety data, do you ever feel sorry for the person who has to read all the laboratory listings and summaries? Or have you ever wondered if there is a better way to visualize safety data? Let's use animation to help the reviewer and to reveal patterns in your safety data, or in any data! This hands-on workshop demonstrates how you can use animation in SAS® 9.4 to report your safety data, using techniques such as visualizing a patient's laboratory results, vital sign results, and electrocardiogram results and seeing how those safety results change over time. In addition, you will learn how to animate adverse events over time, and how to show the relationships between adverse events and laboratory results using animation. You will also learn how to use the EXPAND procedure to ensure that your animations are smooth. Animating your data will bring your data to life and help improve lives!


HW-055 : Understanding Administrative Healthcare Data sets using SAS programming tools
Jayanth Iyengar, Data Systems Consultants LLC

Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers, and insurance companies is massive, and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of Administrative Healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This training seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises involving healthcare datasets.


HW-065 : Getting Started with the SGPLOT Procedure
Josh Horstman, Nested Loop Consulting

Do you want to create highly-customizable, publication-ready graphics in just minutes using SAS? This workshop will introduce the SGPLOT procedure, which is part of the ODS Statistical Graphics package included in Base SAS. Starting with the basic building blocks, you'll be constructing basic plots and charts in no time. We'll work through several different plot types and learn some simple ways to customize each one.


HW-098 : The Joinless Join ~ The Impossible Dream Come True Using SAS® Enterprise Guide® and Base SAS® PROC SQL and DATA Step; Expand the Power of SAS® Enterprise Guide® and Base SAS® in New Ways
Kent Phelps, Illuminator Coaching, Inc.
Ronda Phelps, Illuminator Coaching, Inc.

SAS Enterprise Guide and Base SAS can easily combine data from tables or data sets by using a PROC SQL Join to match on like columns or by using a DATA Step Merge to match on the same variable name. However, what do you do when tables or data sets do not contain like columns or the same variable name and a Join or Merge cannot be used? We invite you to attend our exciting Joinless Join Hands-On Workshop where we will empower you to expand the power of SAS Enterprise Guide and Base SAS in new ways by creatively overcoming the limits of a standard Join or Merge. You will learn how to design a Joinless Join based upon dependencies, indirect relationships, or no relationships at all between the tables or data sets using SAS Enterprise Guide and Base SAS PROC SQL and DATA Step. In addition, we will highlight how to use a Joinless Join to prepare unrelated joinless data to be utilized by ODS and PROC REPORT in creating a PDF. Come experience the power and versatility of the Joinless Join to greatly expand your data transformation and analysis toolkit.


HW-099 : The Baker Street Irregulars Investigate: Discoveries Using Perl Regular Expressions and SAS®
Peter Eberhardt, Fernwood Consulting Group Inc

A true detective needs the help of a small army of assistants to track down and apprehend the bad guys. Likewise, a good SAS® programmer will use a small army of functions to find and fix bad data. In this paper we will show how the small army of regular expressions in SAS can help you.


Health Sciences

HS-027 : Tips, Tricks, and Traps on Longitudinal Data Analysis with Discrete and Continuous Times
Brandy Sinco, University of Michigan
Edie Kieffer, University of Michigan
Michael Spencer, University of Washington
Gray Ficker, CHASS Center
Gretchen Piatt, University of Michigan

When longitudinal data are collected at discrete time points, such as at baseline, 6 and 12 months, compared to continuous times, both exploratory data analysis and linear mixed models need to be modified. For data at discrete times, analysts can use Proc Corr to examine the correlation matrix by simply listing the variable names at each timepoint. In contrast, long datasets with continuous times must be transposed to a format that can be used with Proc Corr, by using the first and last functions. This presentation includes tips and tricks for viewing the empirical correlation structure when time is continuous. When using Proc Mixed for a linear mixed model, some correlation structures differ between models with discrete and continuous times. SAS offers correlation structures especially designed for continuous data, as well as structures that were designed for data with discrete times. One trap and trick is the Estimate statement in Proc Mixed. For data at discrete times, time point coefficients are easily included in the Estimate statement. However, for polynomial models that contain time raised to various powers, the proper coding of time can make a difference between getting an "Inestimable error" versus a useful estimate. This presentation features an example with diabetes intervention data collected over several years with a linear mixed model containing a third degree polynomial for time. When time was originally coded in months, the estimate statement in Proc Mixed produced an "Inestimable error". When time was re-coded in years, the estimate statement generated useful information.


HS-036 : How to Navigate in a Maze of the Raking Macro with Advanced Weight Trimming
Michael Battaglia, Battaglia Consulting Group, LLC
David Izrael, Abt Associates
Sarah Ball, Abt Associates

Raking to population control totals is often the final step in developing survey weights. Raking is an iterative procedure that brings the weighted sample into agreement on socio-demographic variables that are available for the sample and the population. It is primarily used to reduce unit nonresponse bias. Raking can lead to some observations ending up with extreme weights; in other words, weights that are very large or very small compared to the mean weight, resulting in inflated standard errors. In 2009, we enriched a SAS® raking macro implementing weight trimming during the raking iterations, ensuring that the weighted sample agreed with the population. We recently further enhanced the macro adding several options related to weight trimming. Among them, two trimming methods - "AND" or '"OR" - and an option that allows us to set some different convergence criteria for a subset of the raking variables. This paper should help users to navigate among a number of options and parameters to more efficiently use the power of the raking macro with advanced weight trimming.


HS-038 : Assessing Model Adequacy in Proportional Hazards Regression
Michael G. Wilson, IUSM

Proportional Hazards regression has become an exceedingly popular procedure for conducting analysis on right-censored, time-to-event data. A powerful, numerically stable and easily generalizable model can result from careful development of the candidate model, assessment of model adequacy, and final validation. Model adequacy focuses on overall fitness, validity of the linearity assumption, inclusion of a correct (or exclusion an incorrect) covariate, and identification of highly-influential observations. Due to the presence of censored data and the use of the partial maximum likelihood function, diagnostics to assess these elements in proportional hazards regression compared to most linear modeling exercises can be slightly more complicated. In this paper, graphical and analytical methods using a rich supply of distinctive residuals to address these model adequacy challenges are compared.


HS-039 : Using SAS® to Validate Clinical Prediction Models
Xiaoting Wu, University of Michigan
Chang He, The Michigan Society of Thoracic and Cardiovascular Surgeons Quality Collaborative
Donald Likosky, Michigan Medicine

Model validation is an important step in establishing a clinical prediction model. Model validation process quantifies how well the model predicts outcomes for future patients. However, there were very few SAS programming examples showing the validation process. We previously developed a generalized mixed effect model that predicts peri-operative blood transfusion from patient characteristics. In this paper, we demonstrate the SAS® techniques that we used to validate such a model. These validation methods include calibration, discrimination and sensitivity analysis using bootstrapping method.


HS-045 : Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Xi Chen, University of Kentucky
Jin Xie, Department of Statistics, University of Kentucky
Qingcong Yuan, Department of Statistics, Miami University

A key focus in current cancer research is the discovery of cancer biomarkers that allow earlier detection with high accuracy and lower costs for both patients and hospitals. Blood samples have long been used as a health status indicator, but DNA methylation signatures in blood have not been appreciated in cancer research. Historically, analysis of cancer has been conducted directly with the patient's tumor or related tissues. Such analyses allow physicians to diagnose a patient's health and cancer status; however, physicians must observe certain symptoms that prompt them to use biopsies or imaging to verify the diagnosis. This is a post-hoc approach. Our study will focus on epigenetic information for cancer detection, specifically information about DNA methylation in human peripheral blood samples in cancer discordant monozygotic twin-pairs. This information might be able to help us detect cancer much earlier, before the first symptom appears. Several other types of epigenetic data can also be used, but here we demonstrate the potential of blood DNA methylation data as a biomarker for pan-cancer using SAS® 9.3 and SAS® EM. We report that 55 methylation CpG sites measurable in blood samples can be used as biomarkers for early cancer detection and classification. Keywords SAS, Epigenetic, Cancer Detection, Cancer Biomarker, PCA, SAS, Statistical Learning, Machine Learning


HS-053 : Creating suicide attempt/intentional self-harm episodes using administrative billing data
Jamie Kammer, New York State Office of Mental Health
Mahfuza Rahman, NYS OMH
Qingxian Chen, NYS OMH

In identifying the services that individuals received in the time prior to and following a suicide attempt or intentional self-harm episode, it is important to separate out those services that occurred as part of treatment during the emergency room or inpatient visit (including transfers) from services outside of the episode. Specifically, it is useful to understand the service utilization patterns individuals experienced directly prior to a suicide attempt, whether individuals were engaged in outpatient care immediately following emergency room or inpatient treatment, and whether individuals experienced separate suicide attempts outside of an index episode. These insights can inform public health efforts to prevent and treat suicide attempts and intentional self-harm. There are several considerations to take into account when using administrative billing data for public health research and there is a need for standardizing methods around identifying episodes carefully. This paper describes one method for linking together rows of administrative billing data into continuous suicide attempt/intentional self-harm episode(s) with begin and end dates as a first step in service utilization analyses. SAS 9.4 was used.


HS-071 : Leveraging SHEWHART Procedure Options to Monitor and Evaluate Improvements in Healthcare
Roderick Jones, Ann & Robert H. Lurie Children's Hospital of Chicago
Lynn (Xiaohong) Liu, Ann & Robert H. Lurie Childrens Hospital of Chicago

In healthcare, the purpose of statistical process control (SPC) is often to quantify improvements and identify unintended consequences resulting from an intentional change in an environment, policy, treatment protocol, or decision-support tool. SAS/QC facilitates the production of statistics and their visualization through the SHEWHART procedure. Unlike in manufacturing, process change - rather than stability - is commonly sought, and interventions might be frequent and staggered over time. Using PCHART (for a proportion metric) and XSCHART (for a mean time interval) as examples, we describe approaches to 1) defining a baseline period, and extending its centerline prospectively to apply special cause variation tests against it; 2) removing (or "ghosting") from baseline calculations any subgroups that are considered the result of special cause variation; 3) using TESTNMETHOD=STANDARDIZE when subgroup sample sizes are not constant; 4) using the TESTACROSS option to detect variation spanning phases; 5) leveraging the contents of output datasets.


HS-073 : A Macro to Import Subject Data Saved in a Location with Separate Subfolders for each Subject
Laurie Smith, Cincinnati Children's Hospital Medical Center

Often, in Health Sciences, subject data is saved as one file per subject. When subject data is saved in separate files, it is often difficult to import each separate file into SAS® without great effort. If data is organized with the subject ID as the folder name and each subject's data in the corresponding folder, this macro will allow a programmer with basic SAS® programming skills to read the subject ID's from the folder names, and loop though each subjects folder, importing all data within each folder, using SAS® v9.4 for Windows.


HS-074 : Use of SAS Macros to automate the Production of Statistical Process Control Charts
Lynn (Xiaohong) Liu, Ann & Robert H. Lurie Childrens Hospital of Chicago
Roderick Jones, Ann & Robert H. Lurie Children's Hospital of Chicago

The SAS/QC SHEWHART procedure generates statistics and statistical process control (SPC) charts used to measure improvement and special cause variation in a process. To produce output automatically on a repeating schedule for a large number of metrics, a system of sequential SAS programs with macro processing was developed and implemented. The foundation for the process is a parameter file, which stores information including each metric's definition, record-level dataset, variable name and label, temporal unit of analysis, starting point of the time period to be analyzed and SPC chart type. The parameter file is imported and each row (or "run") is converted into macro variables using %LET and %SYSFUNC. CALL SYMPUT, SQL SELECT INTO and %SYMEXIST assign or detect macro variables in real time, which allows a dynamic response from the system. The system of SAS programs acts as a series of pathways with each run directed according to its macro variables using %IF - %THEN/%ELSE logic. Each pathway has code to read record-level datasets, calculate necessary summary statistics and output the results as datasets using ODS OUTPUT for the formatting that's needed prior to running the SHEWHART procedure. PROC SHEWHART is applied iteratively, the number of times depending on OUTTABLE and OUTHISTORY dataset contents and the detection of shifts that require new phases to be defined. With a %DO loop the pathways culminate in the generation and delivery to document libraries of a SPC chart image files, metric description image file, and excel summary statistics file for each run.


HS-079 : SAS Text-mining tools applied to Medical Information Assessment: ICD-10 code retrieval
Mohsen Asghari, Computer Engineering and Computer Science Department, University Of louisville
Daniel Sierra-Sosa, Computer Engineering and Computer Science Department, University Of louisville
Adel S. Elmaghraby, Computer Engineering and Computer Science Department, University Of louisville

The patient diagnosis data recorded in Electronic Health Records (EHR) are usually in free text format. Applying machine-learning techniques to this data is challenging, one example is using natural language processing (NLP) to extract diagnostic codes. An important code to extract is the International Classification Disease (ICD), which has multiple versions. This code provides valuable information for medical information assessment. Currently, in order to understand a patient status, there is a need to read the medical reports, notes, and then use tabular or alphabetic sections of ICD, this creates an inconvenient process for the medical personnel and users. In the US on 2014, there were a 10 years delay to transfer from ICD-9 to ICD-10. Coders must be able to use ICD-9 quickly and efficiently. Although, ICD-10 upgrade is looming in US, the coders should be able to use this new set of codes effectively. There is a need to address the challenges that this transformation brings to coders. We propose the usage of SAS Text-mining tools to extract ICD codes from a medical record. We focused on creating a systematic coding tool for ICD-10 recognition and extraction based on free text inputs from medical data. Our results using MIMIC version III database are promising and shall be reported in detail.


HS-080 : A Data Mining Approach to Predict Dental Adverse Events
Sunil Kumar, Lead Data Scientist

Patient with elevated dental risk is the most important topics in dental care industries. In this paper, a data mining approach was discussed to predict patient at elevated dental risk. SAS® Enterprise Guide and SAS® Enterprise Miner were used for data manipulation and survival modeling.


HS-083 : A Macro to Calculate Sample Size for Studies Using the Proportional Time Assumption
Brian Mosier, University of Kansas Medical Center
John Keighley, University of Kansas Medical Center
Milind Phadnis, University of Kansas Medical Center

Sample size calculations for time-to-event outcomes are done mostly based on the assumption of proportional hazards or of exponentially distributed survival times. These assumptions are not appropriate for all scenarios and should not be implemented if the assumptions are not met. Phadnis et al1 introduces an alternative method using the assumption of proportional time by using the generalized gamma ratio distribution to calculate sample size. We developed a macro to calculate sample size needed for studies using the proportional time assumption for a given value of power in an efficient way. The macro automates the method from the paper to simulate survival data for two treatment arms with the test statistic following a generalized gamma ratio distribution. It then utilizes the bisection method in order to find the appropriate sample size needed for power, input by the user along with additional parameters. We have implemented various features in the macro, allowing for one or two sided tests and an option that graphs the power function (additional features). This macro is a tool statisticians can use to make sample size calculations for studies using the proportional time assumption when some form of historical information is available from a prior study. 1Phadnis MA, Wetmore JB, Mayo MS. A clinical trial design using the concept of proportional time using the generalized gamma ratio distribution. Statistics in Medicine. 2017;36:4121-4140 https://doi.org/10.1002/sim.7421


HS-088 : Baseline Mean Centering for Analysis of Covariance (ANCOVA) Method of Randomized Controlled Trial Data Analysis
Jennifer Scodes, New York State Psychiatric Institute

Many analytical approaches exist to compute treatment effects and within-group changes from baseline for data analysis of randomized controlled trials with multiple follow-up visits. One of these approaches is the analysis of covariance (ANCOVA) method, in which baseline values are included as a covariate instead of as an outcome. Using the ANCOVA method, the treatment effects can be easily computed from model estimates; however, within-group changes from baseline cannot be directly computed in SAS procedures without centering the outcome measures by the overall baseline mean. This paper will present a macro that can be used to analyze data from two-arm randomized controlled trials using the ANCOVA method to compute and present both treatment effects and within-group changes from baseline using baseline mean centering. This paper is intended for all levels of SAS users that analyze clinical trial data.


HS-094 : Genocide Modeling - Historical Risk Factors and Odds Ratios
David Corliss, Peace-Work

This analysis identifies risk factors associated with genocide events. A review of historical conflicts where genocide was present in some and not others provided the data. Using these data, Decision Tree and Random Forest models identify variables with measurable association with genocide events. Logistic Regression and Decision Tree methods are applied to the screened list of variables. Odds ratios are calculated to assess the relative risk of different factors. These models are used to assess the relative likelihood of genocide occurring or developing in the near year in various countries.


HS-118 : Bootstrap Linear Mixed-Effects Models using SAS Procedures
Adams Kusi Appiah, University of Nebraska Medical Center

The bootstrap resampling technique is a general method for estimating the sampling distribution of a statistic of interest and is applied in many research applications. It can obtain more robust parameter estimates and confidence intervals in situations where no assumptions about the underlying distribution of the model are available. However, the main concern of the bootstrap method is how to generate a bootstrap distribution to resemble the true distribution of the observed data. In the context of linear mixed effects models, the distribution of samples should be generated to account for between-subject variability and residual variability in the data. The bootstrap method will be applied with various ways to resample data for linear mixed effects models using the SURVEYSELECT and MIXED procedures available with SAS®/STAT software. These methods will be applied to assess the uncertainty of parameter estimates in linear mixed effect models with data from the National Cooperative Gallstone Study (NCGS). The results by maximum likelihood (ML), restricted maximum likelihood (REML), and the bootstrap methods are compared. The parametric, semiparametric, and non-parametric bootstrap methods for generating samples and estimating the parameters are discussed.


HS-119 : Automated Transfer of a Sea of SAS® Programs between Data Transfers
Xuelin Li, Eli Lilly and Company
Jameson Cai, Eli Lilly and Company
Cindy Lee, Eli Lilly and Company

In pharmaceutical industries, huge number of SAS programs are rerun routinely to refresh SDTM/ADaM data sets and TFLs (Tables, Figures, Listings) in different folder locations to accommodate data transfers. Manually updating the paths within the programs can be tedious. In this paper, we developed a method to move the SAS programs to the new location with automated path update within the programs. In our approach, we first use the PIPE engine in the FILENAME statement to collect file names of all the SAS programs in a location and create a macro variable for the list of file names. Secondly, we use the INFILE statement to create a temporary dataset by reading each SAS program. Thirdly, from each temporary dataset we use the FILE statement to write a new SAS program in the new location with updated path name. This method facilitates the process of moving SAS programs from one location to another location seamlessly and save hours of programmers' time in updating every program. More importantly, this automation process eliminates the chance for human errors. The work involved in this abstract can be done by using SAS version 9. Audience of this presentation are expected to have advanced SAS skills.


HS-123 : A Macro to Add SDTM Supplemental Domain to Standard Domain
Pratap Kunwar, EMMES Corp

Many pharmaceutical and biotechnology industries are now preferring to setup Study Data Tabulation Model (SDTM) mapping in the beginning of the study rather at the end, and use SDTM datasets to streamline the flow of data from collection through submission. With having SDTM datasets in disposal, it is a logical choice to use them for any clinical reports. Getting information from supplemental (SUPP) domain back to parent domain is a regular step that programmers can't avoid. But, this step can be very tricky when either (1) SUPP domain contains multiple types of identifying variables, or (2) SUPP domain empty or does not exist. In this presentation, I will present an easily understandable macro that will produce correct results in every possible scenarios.


HS-132 : Toward Adoption of Agile Software Development in Clinical Trials
Troy Hughes, Datmesis Analytics

Agile methodologies for software development, including the Scrum framework, have grown in use and popularity since the 2001 Manifesto for Agile Software Development. More than having obtained ubiquity, Agile demonstrably has defined software development in the 21st century with its core foci in collaboration, value-added software, and flexibility gained through incremental development and delivery. Although Agile principles can easily be extrapolated to other disciplines, Agile-related nomenclature, literature, application, and employment descriptions often remain focused on software development alone. In SAS® data analytic development-which often typifies software development that occurs within the clinical trials/pharmaceutical industry-developers and other practitioners also build complex, enduring software and data infrastructure, but often for their own or their team's use, and usually with the intent of transforming data into knowledge and data-driven decisions. And, because these outcomes are more highly valued than their underlying code, clinical trials organizations are more likely to focus on data quality than code quality. Thus, because software development methodologies such as Agile focus on programming and the programming environment and process, they can be overlooked by clinical trials organizations who view software development as a tool, not a product or outcome. Notwithstanding, every tool deserves to be wielded effectively and, to that end, this text introduces Agile development for use in clinical trials organizations. Moreover, the paucity of reference to Agile or any software development life cycle (SDLC) or development methodology within clinical trials is demonstrated through examination of SAS® user-published white papers, SAS® Institute books, and clinical trials employment postings.


HS-136 : Addressing Opioid Crisis using Data Science
Rishabh Mishra, Oklahoma State University

Opioids are commonly prescribed medication by doctors mainly used for treating acute and chronic pain. They are highly addictive and patients tend to become tolerant to the drug after a certain point in time. It means that patients either have to increase the dosage of the drug or stop taking it and both have their own set of disadvantages. On one hand, an overdose from these drugs can be fatal and on the other hand stopping these drugs can cause severe withdrawal symptoms and recurrence of pain. In this paper, we will use opioid data to predict the death ratio in each state across the United States. We will be using prescription data, patient survey data and death data for this analysis. If we can accurately predict the states with high death rates then it is possible that governmental action can be taken to avoid deaths in those states. From www.cdc.gov website, we obtained an overdose dataset, which contains prescriptions made by various physicians, and death rates by state for all deaths caused by opioid overdose. This project will use SAS Enterprise Guide and SAS Enterprise Miner to conduct predictive analysis using methods like decision tree, logistic regression, and random forest.


HS-142 : Spotfire Clinical visualizations from SAS and R
Zeqing Lu, Eli LIlly
Hillary Graham, Eli Lilly
Jessica Chen, Eli Lilly

Data visualization is an innovative way to help our business decision-making process more intuitive and efficient. With this emphasis in mind, our project focuses on the interactive capabilities of the popular visualization tools: Spotfire, SAS and R. Spotfire is a very powerful data analytics tool. Not only does Spotfire enable us to create various analyses using its robust visualization layout, it also is surprisingly user-friendly. Its intuitive user interface, potent visualization prowess, and quick subgroup filters make it an indispensable tool to the optimization of drug development. In the past, we could only present screenshots on the Kaplan Meier and forest plots in data review meetings. Such statistical tests and visualizations that Spotfire does not currently support can easily be done in SAS and R. SAS datasets can be directly imported to Spotfire and linked with safety/efficacy datasets. Then we can create visualizations based on the SAS generated datasets. In addition, Spotfire is equipped with TERR, Tibco Enterprise Runtime for R, which allows Spotfire to access and run the R application within its own interface. R enabled us to compute the relevant statistical tests to display alongside the visualizations. Using these features, we were able to create on-demand interactive visualizations to present subgroup efficacy analyses. Exploratory analyses can be viewed before they are formalized, therefore minimal adhoc TFLs are needed which is cost saving. In addition, the efficacy templates can improve team engagement during data review meetings that leads to faster business decision.


HS-143 : AutoPDF : an R Package to Output Vector Graphics
Hillary Graham, Eli Lilly
Zeqing Lu, Eli LIlly
Michelle Carlsen, Eli Lilly

Many statistical analysts have been outputting graphics in RTF format. In pharmaceutical companies, RTF files are difficult to alter for publication or submission because they are not an editable file type. On the other hand, vector-based files enable users to modify colors, sizes, labels, etc. in Adobe Illustrator before disclosure. Currently, R has the capability to create vector graphics in PDF format. However, the process of outputting graphics in PDF format from R requires several steps. This can be irritating for experienced users, and discouraging for new users. In order to make the process easier, we have created an R package to output graphics into a PDF. This will automate the creation of vector-based graphics in R, making it more efficient for both new and experienced R-users to prepare plots ready for disclosure.


SAS 101 Plus

SP-002 : Making Your SAS® Output, Results, Reports, Charts and Spreadsheets More Meaningful with Color
Kirk Paul Lafler, Software Intelligence Corporation

Color can help make SAS® output, results, reports, charts and spreadsheets more professional and meaningful. Instead of producing boring and ineffective results, users can enhance the appearance of their output to highlight and draw attention to important data elements and issues, including headings, subheadings, footers, minimum and maximum values, ranges, outliers, special conditions, and other elements. Color can be added to text, foreground, background, row, column, cell, summary, and total with amazing color and traffic lighting scenarios. Topics include exploring an assortment of examples to illustrate the various ways output, documents, reports, charts and spreadsheets can be enhanced with color, effectively add color to PDF, RTF, HTML, and Excel spreadsheet results using PROC PRINT, PROC REPORT, PROC TABULATE, and PROC SGPLOT and Output Delivery System (ODS) with style.


SP-048 : Using Multilabel Formats with PROC SUMMARY to Generate Report Data with Overlapping Time Segments
John Schmitz, Luminare Data

SAS introduced the multi-label format (MLF) in Version 8. Yet, few users are familiar with the MLF or its unique capabilities. MLFs are used for data summarization where the same observation may be classified into 2 or more levels, simultaneously. This paper shows how multi-label formats can be used to generate time segments with overlapping periods. Core steps include creating the multi-label format definition, applying MLFs to CLASS variables within PROC SUMMARY, and properly understanding the results.


SP-049 : Data-driven Data Analysis
Jack Shoemaker, Texture Health

When confronted with a new data channel, the modern data scientist or analyst will employ sophisticated data visualization tools like Visual Analytics to size up the data. Not all users have access to these tools and must rely on more pedestrian code-based approaches. This paper explores techniques using Base SAS to provide data-driven data analysis to help size up data absent the more modern tools. These techniques leverage the details about data available from the dictionary subsystem. Knowing the names, formats, and data types of the data allows one to derive great insight into the content of the data stream.


SP-057 : A Macro that Can Get the Geo Coding Information from the Google Map API
Ting Sa, Cincinnati Children's Hospital Medical Center

This paper introduces a macro that can automatically get the geo coding information from the Google map API for the user. The macro can get the longitude, latitude, standard address and the address components like street number, street name, county or city name, state name, zip codes etc. for the user. To use the macro, the user only needs to provide a simple SAS input data, then the macro will automatically get the data and save them into a SAS data set for the user. The paper includes all the SAS codes for the macro and provides the input data example to show you how to use the macro.


SP-061 : Seeing the Forest for the Trees: Part Deux of Defensive Coding by Example
Donna Levy, Syneos Health
Nancy Brucken, Syneos Health

As statisticians and programmers, SAS® is part of our daily life. Through assessing patterns, data quality, programming datasets, analysis displays or developing simulations, we need to determine the best ways to conduct our daily work, allowing us to see the forest for the trees. This paper provides guidance on quality defensive programming, efficient coding as well as good programming concepts. Programming no no's will also be discussed. The concepts discussed will allow us to navigate through the trees --- that is, seeing the trees for the forest. We may have been programming in SAS for weeks, months, years or decades. Regardless, we should continue to expand our skills and continue learning and updating our techniques. With this paper, we will provide reminders for paths lost in the past, as well as new tips to help us clear the brush from the trail. This paper is part deux of Defensive Coding by Example (Brucken and Levy, 2015), quenching our thirst for adventure in the great SAS hinterland.


SP-062 : Efficiency Programming with Macro Variable Arrays
Veronica Renauldo, QST Consultations

Macros in themselves boost productivity and cut down on user errors. However, most macros are not robust and serve only a few specific repetitive purposes. Just like arrays increase the efficiency of a datastep, macro variable arrays increase the efficiency of a macro. Macro variable arrays allow the macro to function more autonomously than what is typical for macro processing and work in all SAS® platforms that support macro processing. Automating the process of determining the number of times a macro needs to be utilized for a task is just one of the several applications of macro variable arrays. There are numerous ways to create macro variable arrays such as %LET statements, PROC SQL, and CALL SYMPUT statements; each with their own user-friendly approach. Macro variable arrays employ the use of loops and logic to construct comprehensive macros allowing for a multitude of output types functioning within one macro call. Constructing dynamic macros will increase the capacity of a macro while dramatically decreasing the lines of code in each program. In conjunction with macro functions such as %SYSFUNC, %SCAN, and %STR, macro variable arrays allow the creator and user of a macro to be more flexible with their coding; ultimately leading to more productivity with less code alterations. Impress your boss, your friends, and yourself with macro code that almost writes itself.


SP-063 : Dating for SAS Programmers
Josh Horstman, Nested Loop Consulting

Every SAS programmer needs to know how to get a date... no, not that kind of date. This paper will cover the fundamentals of working with SAS date values, time values, and date/time values. Topics will include constructing date and time values from their individual pieces, extracting their constituent elements, and converting between various types of dates. We'll also explore the extensive library of built-in SAS functions, formats, and informats for working with dates and times using in-depth examples. Finally, you'll learn how to answer that age-old question... when is Easter next year?


SP-066 : Merge with Caution: How to Avoid Common Problems when Combining SAS Datasets
Josh Horstman, Nested Loop Consulting

Although merging is one of the most frequently performed operations when manipulating SAS data sets, there are many problems which can occur, some of which can be rather subtle. This paper examines several common issues, provides examples to illustrate what can go wrong and why, and discusses best practices to avoid unintended consequences when merging.


SP-069 : What's the Difference? Using the PROC COMPARE to find out.
Larry Riggen, Indiana University

We are often asked to determine what has changed in a database. There are many tools that can provide a list of before and after differences (e.g. Redgate Data Compare), but SAS PROC COMPARE can be coupled with other tools in base SAS to analyze the changes. This paper will explore using the output file produced by PROC COMPARE and the SAS Macro language to produce spreadsheets of detailed differences and summaries to perform this task.


SP-075 : From Clicking to Coding: Using ODS Graphics Designer as a Tool to Learn Graph Template Language
Margaret Kline, Grand Valley State University
Daniel Muzyka, Grand Valley State University

ODS Graphics Designer brings simple graphics creation to SAS platforms 9.2 and later. This application enables any novice user who can navigate an interactive point-and-click menu to generate highly customizable graphical representations. ODS Graphic Designer which functions in conjunction with the suite of SAS products can be invoked to facilitate the creation of Graph Template Language (GTL) through a non-intimidating interface. Not only can this code be edited at a subsequent time but providing novice users with the instant gratification of a striking graphic display could encourage the continued expansion of SAS skills or ease the transition from other software. There is untapped potential in ODS Graphics Designer as an educational tool which exists in its ability to acquaint users with the underlying syntax of GTL. This paper describes how to access and navigate the user interface, provides examples of generated and edited code, and discusses potential uses and limitations to showcase the ability of ODS Graphics Designer as a pedagogical tool for beginner to intermediate programmers.


SP-076 : Tips, Traps, and Techniques in BASE SAS for vertically combining SAS data sets
Jayanth Iyengar, Data Systems Consultants LLC

Although not as frequent as merging, a data manipulation task which SAS programmers are required to perform is vertically combining SAS data sets. The SAS system provides multiple techniques for appending SAS data sets, which is otherwise known as concatenating, or stacking. There are pitfalls and adverse data quality consequences for using traditional approaches to appending data sets. There are also efficiency implications with using different methods to append SAS data files. In this paper, with practical exampes, I examine the technical procedures that are necessary to follow to prepare data to be appended. I also compare different methods that are available in BASE SAS to append SAS data sets, based on efficiency criteria.


SP-078 : Improving Plots Using XAXISTABLE and YAXISTABLE
Jacob Keeley, Grand Valley State University
Carl Nord, Eli Lilly and Company

New to the SGPLOT procedure for SAS 9.4, the XAXISTABLE and YAXISTABLE statements respectively create an X/Y axis aligned row of textual data placed at specific locations in relation to the primary plot within the given SGPLOT procedure. The XAXISTABLE and YAXISTABLE statements are applicable with any primary plot, aside from BAND, BLOCK, FRINGE, REG, LOESS, and PBSPLINE plots. Along with directing the X, Y coordinates of the supplementary data values, there are many options accompanying the XAXISTABLE/YAXISTABLE statements which allow the user to change the color, order, and position of the accompanying row(s) of data. The XAXISTABLE statement proves its worth when used in conjunction with survival analysis. When dealing with Kaplan Meier survival curves, the XAXISTABLE statement essentially allows the user to personalize their own at risk tables when the LIFETEST procedure lacks the functionality necessary for the request. In this framework, the XAXISTABLE improves greatly upon what would have previously been an arduous task. Overall, the XAXISTABLE and YAXISTABLE statements are a welcome addition to the SGPLOT syntax, as the user is given even more control over the appearance of a desired plot, making it less likely that the plot needs to be altered after it has been output.


SP-084 : Automating SAS Job Streams With the Power of VB Script
Lindsey Whiting, Kohler
Joey Kaiser, Kohler

In order to make important business decisions it is crucial to have the ability to manipulate and analyze large amounts of data. With the amount of data available to us in the world today, automating SAS jobs has become a necessity to provide efficient and critical business improvements. A great strategy to automate SAS jobs and give you the ability for custom error checking is to leverage VB Scripting with your SAS code. This paper will discuss job stream automation and utilizing SAS to check files' statuses, notify users of program errors and automatically send out the results of your code.


SP-100 : Using SASv9.cfg, autoexec.sas, SAS® Registry, and Options to Set Up Base SAS®
Peter Eberhardt, Fernwood Consulting Group Inc

Are you frustrated with manually setting options to control your SAS® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS Registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.


SP-106 : SAS Techniques to Handle Big Files And Reduce Execution times
Kaiqing Fan, PNC Bank

As a Sr. Data Scientist, Sr. SAS Tech Lead, we are always struggling with the big data execution, and especially long execution time of our SAS engines, sometimes it would be couple hours, sometimes it may be more than 40 hours, or much longer even we use many servers and memories. Too long execution time would not be acceptable. Actually I have many innovative SAS technical skills, if you can use them properly, we can hugely shorten the execution time. I did it. I successfully shortened the execution time of many SAS engines from 136 hours to around 2 hours, from 9 hours to 20-30 minutes, 3.5 hours to 6 minutes. Here I want to summarize most of the technical skills I used and share them with you.


SP-116 : Order, Order! Four Ways to Reorder Your Variables, Ranked by Elegance and Efficiency
Louise Hadden, Abt Associates Inc.

SAS(r) practitioners are frequently required to present variables in an output data set in a particular order, or standards may require variables in a production data set to be in a particular order. This paper and presentation offer several methods for reordering variables in a data set, encompassing both data step and procedural methods. Relative efficiency and elegance of the solutions will be discussed.


SP-139 : Keeping Up to Date with ODS Graphics
Warren Kuhfeld, SAS

SAS 9.4M5 provides you with several enhancements to ODS Graphics including a new procedure. You can use PROC SGMAP to create maps and superimpose graphs such as bubble plots. Bar charts in existing procedures such as PROC SGPLOT have new options for fill patterns and fill types. New options for box plots enable you to display statistics and control the caps on whiskers. Other options enable you to modify tick labels, tick styles, legends, baselines, and reference line thickness. You can also control the image names when there are BY groups. This talk describes and illustrates these recent updates.


SAS 301 Beyond the Basics

SB-010 : A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
Kirk Paul Lafler, Software Intelligence Corporation
Stephen Sloan, Accenture

Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. When unique and reliable identifiers, referred to as the key, are available, users routinely are able to match records from two or more data sets using merge, join, and/or hash programming techniques without problem. But, when a unique and reliable identifier is not available, or does not exist, then one or more fuzzy matching programming techniques must be used. Topics include introducing what fuzzy matching is along with examples of the SOUNDEX (for phonetic matching) algorithm, and the SPEDIS, COMPLEV, and COMPGED functions to resolve key identifier issues and to successfully merge, join and match less than perfect or messy data.


SB-019 : Visual Storytelling - The Art of Communicating Information with Graphics
Kirk Paul Lafler, Software Intelligence Corporation

Telling a story with facts alone can be boring, while stories told visually engage. It's been said that humans tend to process visual elements many times faster than reading words. The data analysis process involves the gathering and collection, cleansing, transforming, modeling and storytelling of data from various sources. The objective is to discover, evaluate, understand and derive useful information from the data to support decision-making. Unfortunately, data analysts sometimes omit a very crucial step - the development of a visual narrative about the data analysis process and outcome. This omission not only fails to bring context, insight and interpretation of the data analysis results in a clear and precise way, it neglects to bring meaning, relevance and interest to the "key" points of the data analysis results. Topics include describing the importance, considerations and steps needed to develop a compelling narrative with visuals; communicate a convincing point-of-view by letting your visuals do the talking; help your audience see hidden, or hard to see, things in your data; how to avoid the obvious by surprising and engaging your audience; techniques on how to share a lasting message by teaching something; and examine a variety of visuals and graphics to persuade your audience to understand the complexities associated with the data analysis results.


SB-021 : How to Assembly Line Create Graphic Images Using PROC TEMPLATE in SAS Enterprise Guide? Part I
Kaiqing Fan, Mastech Digital Inc (PNC Bank)

In banking industry, all the variables, their values and requirements are keeping changing. As a SAS developer, we may be asked to create thousands or hundreds composite or single graphic images such as scatterplot, seriesplot, stepplot, vectorplot, barchart, linechart, piechart, waterfallchart, boxplot, densityplot, histogram, loessplot, Pbsplineplot, regressionplot using pipeline operation method through SAS graphic engines. To reach this purpose, it is impossible to manually modify many parameters or engines codes for each graphic image. Any manual interventions may cause horrible disasters or mass with the requirement of creating hundreds or thousands graphic images. Then question is coming: how to assembly line create composite or single graphic images using PROC TEMPLATE is the topic now. To reach this target, we need to automatically generate all or most parameters and cover expected changes if possible. My Paper --- Some Tricks and Explanations When Plotting Graphic Images Using PROC TEMPLATE SAS® Enterprise Guide Part III --- was accepted by SAS Global Forum 2018, session ID is 2545; But it is the part III. I still have part I and Part II here. Only if they are all together, they can make wholeness for SAS visualization using assembly line way to create hundreds and thousands of graphics.


SB-034 : Backsplash patterns for your world: A look at SAS OpenStreetMap (OSM) tile servers
Barbara Okerson

Originally limited to SAS Visual Analytics, SAS now provides the ability to create background maps with street and other detail information in SAS/GRAPH® using open source map data from OpenStreetMap (OSM). OSM provides this information using background tile sets available from various tile servers, many available at no cost. This paper provides a step-by-step guide for using the SAS OSM Annotate Generator (the SAS tool that allows use of OSM data in SAS). Examples include the default OpenStreetMap tile server for streets and landmarks, as well as how to use other free tile sets that provide backgrounds ranging from terrain mapping to bicycle path mapping. Dare County, North Carolina is used as the base geographic area for this presentation.


SB-040 : Conversion of CDISC specifications to CDISC data - specifications driven SAS programming for CDISC data mapping
Yurong Dai, Eli Lilly
Jiangang Jameson Cai, Eli Lilly

This is for a metadata driven approach that utilize SAS programming techniques for SDTM and ADaM data mapping. Metadata extracted from specifications are converted into dataset's attributes, format, variable names and their order and sorting order for specification implementation in our reference code. It increases code's reusability, efficiency and consistency between data specifications and output data, and reduced re-work after data specification's update, during code development for SDTM mapping and ADaM datasets derivation.


SB-052 : Show Me That? Using SAS VIYA, Visual Analytics and Free ESRI Maps to Show Geographic Data
John Schmitz, Luminare Data

Visual Analytics includes features to connect to free, premium and custom ESRI map capabilities to display geographic information. This paper provides a simple example for generating a shaded state map, based on input data and free ESRI map capabilities. The paper reviews key configuration settings that impact ESRI map capabilities, generation and promotion of data for use by the geo-map feature, defining the category field for use with geo-mapping, filtering graph data, and producing a state-level shaded map.


SB-059 : Finding National Best Bid and Best Offer - Quote by Quote
Mark Keintz, Wharton Research Data Services

U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS) and the Consolidated Quote System (CQS). CQS contains every updated quote from each of these exchanges, covering some 8,500 stock tickers. It provides the basis by which brokers can honor their fiduciary obligation to investors to execute transactions at the best price, i.e. at the NBBO (National Best Bid or Best Offer). With the advent of electronic exchanges and high frequency trading (timestamps are published to the microsecond), data set size (approaching 1 billion quotes requiring 80 gigabytes of storage for a normal trading day) has become a major operational consideration for market behavior researchers recreating NBBO values. This presentation demonstrates a straightforward use of hash tables for tracking constantly changing quotes for each ticker/exchange combination to provide the NBBO for each ticker at each time point in the trading day.


SB-060 : From Stocks to Flows: Using SAS® HASH objects for FIFO, LIFO, and other FO's
Mark Keintz, Wharton Research Data Services

Tracking gains or losses from the purchase and sale of diverse equity holdings depends in part on whether stocks sold are assumed to be from earliest lots acquired (a FIFO queue) or the latest lots acquired (LIFO). Other inventory tracking applications have a similar need for application of either FIFO or LIFO rules. This presentation shows how a collection of simple ordered hash objects, in combination with a hash-of-hashes is a made-to-order technique for easy data-step implementation of FIFO, LIFO, and other less-likely rules, like HIFO (highest price first out) and LOFO (lowest price).


SB-081 : Picture Perfect: An Introduction to the Image Action Set available with SAS® Viya® Programming
Scott Koval, Pinnacle Solutions, Inc

The need for organizations to be able to process and analyze images has been growing. With the release of SAS Viya, a new set of actions for the Cloud Analytic Services (CAS) server has been made available through SAS Visual Data Mining and Machine Learning programming. The Image Action Set allows for users to load, process, and analyze unstructured data found in image files. This paper offers an overview and examples of common actions found within the Image Action Set.


SB-089 : Analyzing Amazon's Customer Reviews using SAS® Text Miner for Devising Successful Product Launch Strategies
Manideep Mellachervu, Oklahoma State University
Anvesh Reddy Minukuri, Comcast Corporation

Digital economy is showing a tremendous growth in the 21st century and it is having a massive impact in the current society. E-commerce is one element of the Internet of Things and its worldwide sales amounted to 2 trillion USD dollars. This shows the popularity of online shopping and it also implies the evolving of retailers in this Industry. A recent study conducted by GE Capital Retail Bank has found that 81% of consumers perform online research before buying products.This tells that consumers rely heavily on others' opinions and experiences in order to buy a product. Businesses need to understand customers' view of their products and also competitors' products for strategic marketing. E-commerce businesses provide a platform to generate user-experience content through customer. Customer reviews are vital for a buyer to choose the best product out of numerous similar products available in the market. Companies need to analyze the customers' perspective through reviews for a better business, evaluate customer engagement, and devise strategies for the launch of their products. This paper focuses on analyzing the customer reviews primarily on Amazon using Python, SAS Text Miner, SAS Sentiment Analysis and SAS Visual Studio. This project will determine which product features are given high-ratings/low-ratings, how the high-rating features of a best-selling product are performing compared to a similar product that is sold by a different vendor; and how to account for the customers' perception to product price of different brands while launching a similar new product.


SB-093 : Quality Control for Big Data: How to Utilize High Performance Binning Techniques
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine

It is a well-known fact that the structure of real-world data is rarely complete and straightforward. Keeping this in mind, we must also note that the quality, assumptions, and base state of the data we are working with has a strong influence on the selection and structure of the statistical model chosen for analysis and/or data maintenance. If the structure and assumptions of the raw data are altered too much, then the integrity of the results as a whole are grossly compromised. The purpose of this paper is to provide programmers with a simple technique which will allow the aggregation of data without losing information. This technique will also check for the quality of binned categories in order to improve the performance of statistical modeling techniques. The SAS® high performance analytics procedure, HPBIN, gives us a basic idea of syntax as well as various methods (Bucket, Winsor, Quantile, and Pseudo_Quantile), tips, and details on how to bin variables into comprehensible categories. We will also learn how to check whether these categories are reliable and realistic by reviewing the WOE (Weight of Evidence), and IV (Information Value) for the binned variables. This paper is intended for any level of SAS User interested in quality control and/or SAS high performance analytics procedures.


SB-102 : Speed up your Data Processing with SAS Code Accelerator.
Paul Segal, Teradata

SAS® In-Database Code Accelerator enables DS2 code to execute inside the database without translation to another language (such as SQL). This enables your data preparation steps to be dramatically accelerated, as you can now make use of the multi-threading capabilities in a massively parallel architected platform (such as the Teradata relational database management system [RDBMS] or the Apache Hadoop platform). In this short presentation, we introduce those of you unfamiliar with DS2 to the new features as well as demonstrate how performant it can be by running a live demonstration on the Teradata RDBMS.


SB-114 : Wow! You Did That Map With SAS®?! Round II
Louise Hadden, Abt Associates Inc.

This paper explores the creation of complex maps with SAS® software. This presentation incorporates explores the wide range of possibilities provided by SAS/GRAPH and polygon plots in the SG procedures, as well as replays, overlays in both SAS/GRAPH and SG procedures, and annotations including Zip Code level processing. The more recent GfK maps now provided by SAS, that underlie newer SAS products such as Visual Analytics as well as traditional SAS products, will be discussed. The pre-production SGMAP procedure released with Version 9.4 Maintenance release 5 will be discussed in context.


SB-140 : Square Peg, Square Hole-Getting Tables to Fit on Slides in the ODS Destination for PowerPoint
Jane Eslinger, SAS

An output table is a square. A slide in Microsoft PowerPoint is a square. The table, being the smaller square, should fit in the bigger square slide. Right? Well, not always. Despite the programmers expectations, some tables will not fit on the slide created by the ODS destination for PowerPoint. It depends on the table. For instance, tables with, say, more than 10 rows or more than 6 columns might end up spanning multiple slides. But, just as with the popular childrens toy, by twisting, turning, or approaching the hole from a different angle, you can get the peg in the hole. This paper discusses three programming strategies for getting your tables to fit on slides: changing style attributes to decrease the amount of space needed for the table, strategically dividing one table into multiple tables, and using ODS output data sets for greater control over the structure of the tables. Throughout this paper, you will see examples that demonstrate how to apply these strategies using the popular procedures TABULATE, REPORT, FREQ, and GLM.


SB-141 : Advanced ODS Graphics Examples
Warren Kuhfeld, SAS

You can use SG annotation, modify templates, and change dynamic variables to customize graphs in SAS. Standard graph customization methods include template modification (which most people use to modify graphs that analytical procedures produce) and SG annotation (which most people use to modify graphs that procedures such as PROC SGPLOT produce). However, you can also use SG annotation to modify graphs that analytical procedures produce. You begin by using an analytical procedure, ODS Graphics, and the ODS OUTPUT statement to capture the data that go into the graph. You use the ODS document to capture the values of dynamic variables, which control many of the details of how the graph is created. You can modify the values of the dynamic variables, and you can modify graph and style templates. Then you can use PROC SGRENDER along with the ODS output data set, the captured or modified dynamic variables, the modified templates, and SG annotation to create highly customized graphs. This paper shows you how and introduces SG annotation and axis tables. This tutorial is based on the free web book: http://support.sas.com/documentation/prod-p/grstat/9.4/en/PDF/odsadvg.pdf. *Prior experience with ODS Graphics is assumed. Skill Level: Intermediate


SB-145 : Perl Regular Expression - The Power to Know the PERL in Your Data
Kaushal Chaudhary, Eli Lilly and Company
Dhruba Ghimire, Eli Lilly and Company

Perl regular expression is one of the powerful and efficient techniques for complex string data manipulation. SAS® offers regular expression engine in the base SAS without any additional license requirement. This would be a great addition to a SAS programmers' toolbox. In this paper, we present basics of the Perl regular expression and various Perl regular functions and call routine such as PRXPARSE(), PRXMATCH(), and CALL PRXCHANGE etc. with examples. The presentation is intended for beginner and intermediate SAS programmers.


SAS Super Demos

SD-149 : Creating a Custom Task in SAS Studio
Danny Modlin, SAS

Whether you are using SAS Studio in its full version or through SAS University Edition, you will notice that already created tasks are included to help you the user to generate code to do several different things in SAS. Have you ever wanted to alter any of these tasks or maybe even create one of your own? In this Super Demo, we will discuss the means of editing and creating your own SAS Studio tasks to use and share with others.


SD-150 : Executing Open Source Code in Machine Learning Pipelines of SAS Visual Data Mining and Machine Learning
Brett Wujek, SAS

Learn how to incorporate open-source code into your machine learning pipelines to integrate and compare models.


SD-151 : Tune In to Model Tuning
Brett Wujek, SAS

Learn how to build better models faster with the latest advancements in automated hyperparameter tuning in SAS® Visual Data Mining and Machine Learning.


SD-152 : Highly Customized Graphs Using ODS Graphics
Warren Kuhfeld, SAS

Learn how to use the ODS document, PROC TEMPLATE, PROC SGRENDER, a DATA step, and SG annotation to customize every component of the graphs that are produced by analytical procedures.


SD-153 : Heat Maps: Graphically Displaying Big Data and Small Tables
Warren Kuhfeld, SAS

Learn how to use heat maps in graphs, maps, and tables in ODS Graphics. Also learn how to highlight cells in tables in ODS.


SD-154 : What's New in the ODS Excel Destination
Jane Eslinger, SAS Institute

This demo highlights some of the newer features of the ODS Excel destination along with reasons to move to the ODS Excel destination if you have not already.


SD-155 : Creating Pivot tables using ODS Markup
Jane Eslinger, SAS Institute

This demo demonstrates how quickly you can generate pivot tables and pivot graphs from your SAS data. Also demonstrated is the ability to automate this process by creating a task using SAS Studio to generate pivot tables and graphs.


SD-156 : SAS 9.4 ODS in a Nutshell
Cynthia Zender, SAS

Come to this Super Demo to learn the new features of ODS in SAS 9.4. In a nutshell, you'll see examples of using ODS LAYOUT, creating lists and text blocks, using the Report Writing Interface. In addition other topics will include examples of Cascading StyleSheets, and using HTML5 as an ODS destination, as well as examples of ODS PowerPoint and ODS ePUB.


SD-157 : Accessibility with ODS Output
Cynthia Zender, SAS

Creating sophisticated, visually stunning reports is imperative in today's business environment, but is your fancy report really accessible to all? Let's explore some simple enhancements that were made in the fourth maintenance release of SAS® 9.4 to Output Delivery System (ODS) that will truly empower you to accommodate people who use assistive technology. ODS now provides the tools for you to meet Section 508 compliance and to create an engaging experience for all who consume your reports.


SD-158 : The Future of SAS Enterprise Guide and SAS Studio
Amy Peters, SAS Institute

Get insights into the roadmap for the two interfaces and how they are converging.


Statistics / Advanced Analytics

AA-029 : Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables
Matthew Bates, Affusion Consulting

Dummy Indicators are critical to building many statistical models based on data with category type predictors. Most programmers rely on the "class" option within various procedures to temporarily build such predictors behind the scenes. This method carries with it a variety of limitations that can be overcome by auto-generating dummy indicators of all variables below a reasonable threshold of cardinality prior to running such procedures. Statistical modelers may find this topic a real effort and time saver while advanced SAS programmers looking for creative techniques of efficiently automating processes may find this macro worth geeking out over.


AA-030 : Confounded? This example shows how to use SAS chi-square tests, correlations and logistic regression to unconfound a result.
Michael Grierson, Self

The purpose of this paper is to describe an example of how to unconfound a confounded statistical result1 and to present a recipe for unconfounding an analytic conclusion. The confounded result is the conclusion that since African American student loan borrowers are more likely to default on their student loans, that the Department of Education "cannot ignore the interaction of race and student loans". This paper shows that student loan defaults are more (by about 5 times) associated with lower median income status than race.


AA-031 : Screening, Transforming, and Fitting Predictors for Cumulative Logit Model
Bruce Lund, Consultant for Magnify Analytic Solutions

The cumulative logit model is a logistic regression model where the target has 2 or more ordered levels. If only 2 levels, then the cumulative logit is the binary logistic model. Predictors for the cumulative logit model might be "NOD" (nominal, ordinal, discrete) where typically the number of levels is under 20. Alternatively, predictors might be "continuous" where the predictor is numeric and has many levels. This paper discusses methods that pre-screen and transform both NOD and continuous predictors before the stage of model fitting. Once a collection of predictors has been screened and transformed, the paper discusses predictor variable selection for model fitting. One focus of this paper is determining when a predictor should be allowed to have unequalslopes. If unequalslopes are allowed, then the predictor has J 1 distinct slopes corresponding to the J values of the target variable. SAS® macros are presented which implement the screening and transforming methods. Familiarity with PROC LOGISTIC is assumed.


AA-035 : Monitoring the Relevance of Predictors for a Model Over Time
Ming-Long Lam, SAS Institute

In today's intelligence-driven economy, corporations increasingly rely on their algorithmic models to run their business. Like all tangible assets, models do depreciate and their accuracies diminish over time. In order to stay competitive, corporations constantly monitor their models. When signs of deterioration of model performance appear, stakeholders can determine if the models have to be proactively refreshed to correct the problems. Since every decision to refresh a model carries risks and can disrupt normal business, a solid business case must be presented to support the request to refresh a model. This paper presents a novel approach for monitoring model performance over time. Instead of monitoring accuracy of prediction or conformity of predictors' marginal distributions, this approach watches for changes in the joint distribution of the predictors. Mathematically, the model predicted outcome is a function of the predictors' values. Therefore, the predicted outcomes contain intricate information about the joint distribution of the predictors. This paper proposes a simple metric that is coined the Feature Contribution Index in this approach. Computing this index needs only the predicted target values and the predictors' observed values. Thus, we can assess the health of a model as soon as the scores are available, and raise our readiness for preemptive actions long before the target values are eventually observed. This index is model neutral because it works for any types of models that contain categorical and/or continuous predictors, and output predicted values or probabilities. Models can be monitored in near real time since the index is computing using simple and time-matured algorithms that can be run in parallel. Finally, it is possible to provide statistical control limits on the index. These limits help foretell whether a particular predictor is a plausible culprit in causing the deterioration of a model's appearance over time. Practically, if the indices suggest that the joint distribution of the predictors has changed over time, then you can investigate the causes, prepare for deteriorated model performance, and decide whether to refresh the model.


AA-041 : Alternative methods of regression when OLS is not right.
Peter Flom, Peter Flom Consulting

Ordinary least square regression is one of the most widely used statistical methods. However, it is a parametric model and relies on assumptions that are often not met. Alternative methods of regression for continuous dependent variables relax these assumptions in various ways. This paper will explore PROCS such as QUANTREG, ADAPTIVEREG and TRANSREGfor these data.


AA-042 : An introduction to classification and regression trees with PROC HPSPLIT.
Peter Flom, Peter Flom Consulting

Classification and regression trees are extremely intuitive to read and can offer insights into the relationships among the IVs and the DV that are hard to capture in other methods. I will introduce these methods and illustrate their use with PROC HPSPLIT.


AA-047 : Propensity Scores and Causal Inference for (and by) a Beginner
Bruce Lund, Consultant for Magnify Analytic Solutions

In an observational study the subjects are assigned to treatments through a non-randomized process. In the simplest and most typical case there are two treatments, often one is deemed as "control". Associated with the subjects is an "outcome" which is of interest to the researcher. The outcome could be discrete, very often binary, or have continuous numeric values. The researcher wants to know the effect of the treatment on the outcome. But due to the non-random assignment of treatments, a simple comparison of outcomes, such as an average per treatment group, would be biased. One solution to removing the bias rests on finding covariates for the subjects such that the treatment can be regarded as random for subjects having essentially equal covariate values. Once accomplished, then an analysis of outcomes can be performed. Two SAS® procedures, PROC PSMATCH and PROC CAUSALTRT, conduct the analysis of covariates and analysis of outcomes so that a causal effect can be estimated. This paper provides an introductory discussion of the analysis of causal effects in observational studies and gives examples of usage of PSMATCH and CAUSALTRT. Other books or papers should be referenced for the advanced theory and details of the methodology.


AA-077 : Estimating the Impacts of the EDA Public Works Program on County Employments Using SAS/ETS 14.1.
Kwideok Han, Oklahoma State University

Purpose: The Economic Development Administration (EDA) is a federal agency within the U.S. Department of Commerce created by the Public Works and Economic Development Act (PWEDA) of 1965. The primary focus of the EDA is the Public Works Program (PWP), which provides local communities matching grants to promote local economic growth such as the construction of roads, sewers, water supply systems, and industrial parks. This study attempts to identify the impacts of the EDA public works program on county employment levels. The hypotheses to be tested are as following: 1) EDA public works spending has a positive impact on local employment; 2) urban counties see greater employment generation than rural counties, and 3) there are the spatial spillover effects from the EDA public works investments onto neighboring counties' employment. Methods: The study uses a panel data set on the EDA public works projects and county-level socio-economic variables, including county-level private non-firm employment series, during fiscal years 2010-2015. The panel nature of our data allows for the control of unobserved variations in the dependent variable of geographical regions over time. In addition to the panel structure, we utilize the spatial dimension of our data to improve the model performance by capturing the spatial spillover/externality effects of the EDA public works grants on neighboring counties' employment. Both a spatial panel fixed effects model and a spatial random effects model are estimated using SAS/ETS 14.1.


AA-091 : Logistic and Linear Regression Assumptions: Violation Recognition and Control
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine

Regression analyses are one of the first steps (aside from data cleaning, preparation, and descriptive analyses) in any analytic plan, regardless of plan complexity. Therefore, it is worth acknowledging that the choice and implementation of the wrong type of regression model, or the violation of its assumptions, can have detrimental effects to the results and future directions of any analysis. Considering this, it is important to understand the assumptions of these models and be aware of the processes that can be utilized to test whether these assumptions are being violated. Given that logistic and linear regression techniques are two of the most popular types of regression models utilized today, these are the are the ones that will be covered in this paper. Some Logistic regression assumptions that will reviewed include: dependent variable structure, observation independence, absence of multicollinearity, linearity of independent variables and log odds, and large sample size. For Linear regression, the assumptions that will be reviewed include: linearity, multivariate normality, absence of multicollinearity and auto-correlation, homoscedasticity, and measurement level. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.


AA-092 : Regularization Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Karlen Bader, Henry M Jackson Foundation for the Advancement of Military Medicine

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables are linearly related, or codependent. The presence of this phenomenon can have a negative impact on an analysis as a whole and can severely limit the conclusions of a research study. In this paper, we will briefly review how to detect multicollinearity, and once it is detected, which regularization techniques would be the most appropriate to combat it. The nuances and assumptions of R1 (Lasso), R2 (Ridge Regression), and Elastic Nets will be covered in order to provide adequate background for appropriate analytic implementation. This paper is intended for any level of SAS® user. This paper is also written to an audience with a background in theoretical and applied statistics, though the information within will be presented in such a way that any level of statistics/mathematical knowledge will be able to understand the content.


AA-108 : Using SAS® for Multiple Imputation and Analysis of Longitudinal Data
Pat Berglund, University of Michigan

"Using SAS for Multiple Imputation and Analysis of Data" presents use of SAS to address missing data issues and analysis of longitudinal data. Appropriate multiple imputation and analytic methods are evaluated and demonstrated through an analysis application using longitudinal survey data with missing data issues. The analysis application demonstrates detailed data management steps required for imputation and analysis, multiple imputation of missing data values, subsequent analysis of imputed data, and finally, interpretation of longitudinal data analysis results. Key SAS tools including data step operations to produce needed data structures and use of PROC MI, PROC MIANALYZE, PROC MIXED, and PROC SGPLOT are highlighted.


AA-109 : Application of heavy-tailed distributions using PROC IML, NLMIXED and SEVERITY
Palash Sharma, University of Kansas Medical Center

Probabilistic heavy-tailed distribution (Pareto, Weibull, Burr etc.) theory has vast applications involving in many real-life situations and natural phenomena. This area of research attracts not only for theoretical probabilistic nature but also for various branches of statistics. Heavy-tailed distributions are also used for modeling various biological, actuarial, financial, economic, hydrological, and engineering data. In this paper, we are fitting the dataset of the number of customers is affected by electrical blackouts in the USA using Pareto distribution. We also simulate the data arising from Pareto distribution and estimate the parameters of Pareto distribution using maximum likelihood estimation. A suite of SAS procedure is used for all computation, specifically Procedure IML, SEVERITY, NLMIXED.


AA-117 : An Introduction to the process of improving a neural network
Yuting Tian, 7326090713

This paper is a follow-up on an earlier paper done on deep neural nets. An early paper, by Lavery, explored the theory of deep neural nets and, at the end of the paper, showed two quick examples. One example, and the author admitted his example was just to show code, used a deep neural net to predict loan defaults. The initial model had only fair results. This paper is a follow-up to illustrate the process of using SAS tools to improve a network where it is hoped accuracy can be improved.


AA-120 : Handling Missing Data in Exploratory Factor Analysis Using SAS
Min Chen, Cook Research Inc.

Handling Missing Data in Exploratory Factor Analysis Using SAS Exploratory Factor Analysis (EFA) is a statistical technique to reduce the dimension of data and to explore the latent structure within the data. Missing data is almost inevitable while conducting EFA. By default, the SAS procedure will only include the complete cases which most of the time it is not the first choice of researchers. Given EFA could be performed on individual-level data, correlation or covariance matrix, different formats of data could be fed into SAS and different missing data techniques could be applied. This article will demonstrate the above with SAS examples, and briefly comment on how this is generally handled in other statistical software.


AA-121 : How to Score Big with SAS Solutions: Various Ways to Score New Data with Trained Models
Scott Koval, Pinnacle Solutions, Inc

After training a statistical model, the next step is to put it into production in order to score new data. While it might be tempting to manually write code to score data, this can lead to problems with precision, complexity, and updating. SAS solutions offers a wide variety of methods to do this. This paper covers several common techniques, including PROC SCORE, the code statement, and PROC ASTORE. By learning these approaches, SAS can easily put complex models to work.


AA-137 : Getting Started with Bayesian Analytics
Danny Modlin, SAS

The presentation will give a brief introduction to Bayesian Analysis within SAS. Participants will learn the difference between Bayesian and Classical Statistics and be introduced to PROC MCMC.


AA-138 : Introduction to Machine Learning in SAS
Brett Wujek, SAS

This presentation answers the questions of What is Machine Learning? And What does SAS offer for Machine Learning? Examples of specific machine learning techniques such as Random Forest, Gradient Boosting, Support Vector Machines, Neural Networks and K-means are covered.


e-Posters

PO-032 : Great Time to Learn GTL
Richann Watson, DataRich Consulting
Kriss Harris, SAS Specialists Ltd

It's a Great Time to Learn GTL! Do you want to be more confident when producing GTL graphs? Do you want to know how to layer your graphs using the OVERLAY layout and build upon your graphs using multiple LAYOUT statement? This paper guides you through the GTL fundamentals!


PO-044 : Self-service utility to List and Terminate SAS grid jobs
Venkateswarlu Toluchuri, Tech Lead SAS Administrator

SAS® programmers always have difficulty to find their submitted jobs information, and they always depend on SAS® interactive client tools like SAS® EG and Putty sessions to terminate them, in most of the cases SAS® admin has to be involved to clean them up. The solution is to develop a self-service utility, so that the programmer can list & kill the jobs that are no longer required, with this approach it would also improve overall performance of an environment and there is no dependency on SAS® admins to kill the user jobs.


PO-051 : An Update on the CS Standard Analyses and Code Sharing Working Group
Nancy Brucken, Syneos Health
Jared Slain, MPI Research

The Standard Analyses and Code Sharing Computational Science Working Group is providing recommendations for analyses, tables, figures, and listings for data that are common across therapeutic areas (laboratory measurements, vital signs, electrocardiograms, adverse events, demographics, medications, disposition, hepatotoxicity, pharmacokinetics) in the pharmaceutical industry. Ten white papers are at various stages of development, including 6 that have been finalized. The latest white paper to be published covers analyses and displays for adverse events. The working group also created an online platform for sharing code. The code repository contains a wealth of scripts that have been written by PhUSE members or donated by other organizations. Crowd-sourcing code development will enable consistent interpretation of methods and substantial savings in resourcing across the industry. This presentation will provide an update on these efforts.


PO-054 : Using your FREQ effectively: Displays to Decipher Proportional Odds in Ordinal Regression
Robert Downer, Grand Valley State University

The proportional odds assumption of the cumulative logit model is an intriguing challenge in the modeling of an ordinal response. Valuable insight for modeling decisions can be gained by further investigation of why the proportional odds assumption has been satisfied or not. This investigation can be exploratory and completely separate from the logistic modeling. Empirical cumulative logit plots are one possibility but the interpretation is not intuitive with respect to odds or proportions. This paper presents exploratory methods which enhance the toolbox for understanding the proportional odds test. In conjunction with other SAS procedures, effective tabular and graphical options of PROC FREQ are used to support the findings of the proportional odds test. An informal application of the Breslow Day Test is introduced. Model development is not a focus of the paper but some PROC LOGISTIC details are discussed.


PO-068 : Factors Responsible for Students' Enrollment at Oklahoma State University
Parag Vilas Sasturkar, Oklahoma State University

It is crucial for a university to have bright students to become educational leaders in this increasingly competitive world. The Institutional Research and Information Management (IRIM) department at Oklahoma State University (OSU) has been playing a vital role in campus decision-making, managing institutional performance, and providing information, research, and analysis on demand. Therefore, it is very important for IRIM to collect and provide accurate information to market OSU i.e. target to the right audience (prospective students) every year. The main purpose of this research project is to help IRIM to evaluate different factors that have been driving an undergraduate student to enroll at OSU. This project will attempt to determine students who have a better chance or probability of choosing OSU over other universities. The dataset consists of students' data including demographics, admissions, and academic activities. This is a vast dataset containing approximately 45,000 students' records collected over last 2+ years with more than 15 suitable variables. This project will use SAS Enterprise Guide and SAS Enterprise Miner to conduct predictive analysis using methods like decision tree, logistic regression, and random forest to determine variables in the prediction of students' enrollment.


PO-107 : An Easy Way to Know When to Buy and When to Sell your Stocks
Kaiqing Fan, PNC Bank

In the stock markets, we can find thousands of stocks. How to make extra money from the stock market is always an attractive topic for everyone. We know that to make money, the best way is always to buy low and sell high. The question is how we could decide what is the appropriately low prices and high prices of the stocks? Here I have two simple ways. 1) Use the average price of each stock based on its historical price data which we can find online, then we can decide the low and high prices based on each user's appetite; 2) using the normal distribution them to decide the up and low % of the prices, users can adjust the % too based on their appetites. Many people must have many questions about this methods. But if you are interested in, please join my presentations, I believe I could answer all or most of your questions.


PO-111 : How to Avoid Possible Tricks When Using DATA STEP MERGE Instead of PROC SQL JOIN
Guangtao Gao, Cleveland State University

When we merge or join large data files, PROC SQL JOIN can avoid lots of potent troubles, but it costs too much execution time; DATA STEP MERGE can run much faster, however it has some potential tricks. If we can carefully avoid these tricks, DATA STEP MERGE would be a better choice. How to avoid these potential tricks would be our following topic. Here I also correct one popular beautiful method to modify the different lengths of variables directly using length statement.


PO-115 : Purrfectly Fabulous Feline Functions
Louise Hadden, Abt Associates Inc.

Explore the fabulous feline functions and calls available in SAS® 9.1 and later. Using CAT functions and CAT CALLs gives you an easier way to streamline your SAS code and facilitate concatenation of character strings. So, leave verbose coding, myriad functions, and the vertical bar concatenation operators behind! SAS® 9.2 (and beyond) enhancements will also be demonstrated.