Archive of Research Projects

These are research projects that faculty offered to potential students, under the auspices of the Statistics Living Learning Community, in 2014-19.


Research Thrust A: Atmospheric/Earth Science
  • (Dr. Michael Baldwin) My research group is focused on one of the most important challenges in the atmospheric sciences: improving the understanding and prediction of high-impact weather events. These weather events (e.g., tornadoes, droughts, flooding, winter storms) affect public safety as well as many sectors of the economy, such as agriculture, energy, water resources, and transportation; their costs can be severe and wide-ranging. My group has focused on both the short-term prediction problem as well as the longer-term challenge of understanding the effects of global climate change on high-impact weather systems. Recently, we have applied machine learning methods to predict the impact of winter weather on road conditions in Indiana. We also continue to develop and apply image processing algorithms that can identify and track individual weather systems in meteorological data. These algorithms allow us to analyze massive data sets, such as multi-decadal climate simulations and high-volume observations from weather radars. Students working in my research group can expect to analyze these data and evaluate experimental prediction systems across multiple seasons.

  • (Drs. Julie Elliott and Lucy Flesch) The Geophysics group uses geodetic data to answer questions relating to the movement of tectonic plates, slip along continental faults, generation of earthquakes and mountain growth, and movement of glaciers. With new satellites in orbit the the amount of geodetic observations of the Earth from space is allowing for unprecedented exploration of the changes in the Earth's surface over short time scales. Recent significant increases in the amount of freely available geodetic data available requires development of data processing and tracking algorithms to generate detailed times series of surface motions.

  • (Dr. Sonia Lasher-Trapp) I am very excited to have the opportunity to introduce students in Statistics to real research problems in Atmospheric Science, where we make extensive use of statistical concepts and tools to evaluate trends, variability, and correlations, for example, for very large data sets. The data sets used in my research most often include time series of airborne observations of cloud and precipitation development (using a variety of instruments mounted on the aircraft) and/or 3D radar scans of cumulus congestus clouds (the precursors to thunderstorms), and output from high-resolution 3D numerical simulations of these clouds and the precipitation processes occurring within them. When studying clouds and precipitation, we are always struggling with issues of data representativeness, missing data, limited sampling, etc., and the few statisticians working in our field have made significant advances using different statistical models.

  • (Dr. Robert J. Trapp) We make extensive use of statistical concepts and tools to evaluate trends, variability, and correlations in data within large sets. In particular, I use time series of tornado and severe-storm occurrences, Doppler weather radar observations of tornadoes and tornadic storms, and output of numerical models simulations of such storms. In our research of severe weather, we constantly struggle with issues of data representativeness, biases, sampling issues, etc. Statistical models have helped resolve some of these issues, but needed are fresh minds with a strong statistical background to develop further models, and otherwise help advance the science.

  • (Dr. Wen-wen Tung) The Earth System Dynamic Predictability Laboratory studies the dynamics and the predictability of phenomena in the Earth systems on a variety of temporal and spatial scales. We draw data readily available in the public domain or from our own physical model simulations, and we apply or develop methods to perform data analysis. The main thrust of our research is to ask significant and domain-relevant questions. Our work does not end at the analysis results; we train our team, as opportunities often rise, to assess and interpret the results at levels that can be disseminated to stakeholders in the public or private sectors. Students interested in applying to work with us can consider one of the following beginning questions: 1. Does urban air pollution affect landfalling typhoons or hurricanes? How and how much? 2. When an atmospheric river (a very cool phenomenon, check it out!) transits into precipitating weather systems over the US, does the associated phase change of water manifest in Sun--Earth radiation energy budget? How and what are the consequences? 3. Has the changing climate been manifested in global biome and how? What are the implications?

  • (Dr. Wen-wen Tung) Our laboratory specializes in studying the dynamical predictability of Earth and atmospheric systems and related phenomena on a variety of temporal and spatial scales. Some of our datasets are drawn directly from the United States Geological Survey database. For example, recently we have found a particular interest in time series data related to the flows of rivers. For many rivers, especially those in long-developed locations, we have access to reliable records of daily river flow measurements spanning well over 100 years. These complete and relatively lengthy records are an excellent starting point for analysis. Multiscale analysis of geophysical time series is one of our lab's specialties.

  • (Dr. Frederi Viens) Students can work with Viens and with former Ph.D. student L. Barboza (now at the university of Costa Rica) and other Ph.D. students and colleagues at Purdue and elsewhere, to quantify temperature changes, including uncertainty evaluation, over the last 1,000 years regionaly and globally. Viens's group's current research draws on global data; a possible new specific focus could be parts of Africa, because that is where climate change will have the biggest impact on populations, and where some of the most effective solutions may reside. Viens recently served as a Franklin Fellow (2010--2011) for the Africa Bureau at the U.S. Department of State, where he advised U.S. diplomacy on environmental challenges facing sub-Saharan Africa. His background is in probability theory and stochastic processes; he works on theoretical topics in stochastic analysis and applications to mathematical finance, mathematical statistics, and environmental modeling.

  • (Dr. Yutian Wu) Our research group aims at understanding the dynamical processes in the large-scale circulation of the atmosphere and how the processes respond to anthropogenic climate change. One current research project eyes on the fastest warming region in the globe - the Arctic. We are particularly interested in questions like - what are the processes that cause the Arctic warming, how the Arctic warming affects the weather and climate in North America, and are we going to suffer more extreme weather events in the future? The project will be of both scientific and societal importance for better understanding and predicting the future climate in North America. The project will use both observational datasets and state-of-the-art global climate model simulations. Analysis techniques such as time series analysis, spectral decomposition, maximum covariance analysis will be utilized.

  • (Dr. Hao Zhang) Many projects are possible, using the U.S. Climate Data Online NCDC, which provides daily, monthly, and annual precipitation and temperature data at thousands of weather stations. Students can build an understanding of time series analysis, spatial correlation and interpolation, extreme value theory, etc. They can practice model fitting and forecasting. They will learn skills to manage data, e.g., editing, merging, and splitting data sets. Students will also get introduced to new topics not taught in UG classes, e.g., spatial interpolation and extreme value theory.


Research Thrust B: Biostatistics
  • (Dr. Ruben Claudio Aguilar) My cell biology laboratory is particularly interested in basic cellular mechanism with emphasis in vesicle trafficking (e.g., intracellular protein transport). We daily produce enormous data sets from our morphometric analysis of microscopy-generated cell images. The analysis of these (and similar) result collections will be valuable to the students and useful to us. We expect that following an initial training, the students will be able to propose and discuss the advantages and disadvantages of different analytical approaches and to actively participate in the experimental design. In the past, I have successfully recruited undergraduate students from the biology courses I teach. In addition, I participate in the NSF-backed Louis Stokes Alliance for Minority Participation (LSAMP) program and the Purdue Summer Research Opportunity Program (SROP). In our lab, undergraduate students receive scientific training and are presented with the opportunity of pursuing independent research sub-projects. In addition, our undergraduates participate in lab meetings (where they are encouraged to participate and ask questions), and they are being trained in the good practices of scientific presentation. Indeed, our students have been very successful in their research endeavors; we have multiple awards to poster presentations in undergraduate research events and several paper authorships.

  • (Dr. Hyonho Chun) Recent advances in high-throughput sequencing technology produce massive data for revealing DNA sequence composition, finding transcription factor binding, and quantifying gene expression levels; these are 2--3 GB per assay; with multiple assays (replicates), this is truly ``Big Data''. A sequencing machine reveals the bases of millions of short segments of DNA or RNA in a massively parallel way. The resulting reads need to be mapped back to the genome. This can be done with many free open software tools such as Bowtie and SOAP. One needs to summarize the mapping results, called the pile-up step, to see whether there is a base pair change in DNA (SNP discovery), whether the transcription factor binding occurs (peak calling), and to measure how genes are expressed (transcribed). Afterwards, one can perform statistical analysis. Since the sequencing techniques are new, most analyses are based on very simple statistical methods, and should be understandable to UG's with appropriate guidance and discussion. The students will benefit from working with Chun and Ward on Condor for parallel computational analysis.

  • (Dr. Laszlo Csonka) We have two potential projects that could involve sophomore students. One of these involves comparing the rates of evolution of "meaningless" non-coding sequences and gene-coding sequences in Escherichia coli, Salmonella enterica, and other closely related Enterobacteriaceae.

  • (Dr. Laszlo Csonka) The second one consists of investigation of the conservation of gene order (synteny) in distant species of bacteria. Both of these projects require analyses of very large DNA sequence data sets, and therefore would be ideal for computer-savvy statistics majors. It will be a great learning experience for them to be exposed to the data, vocabulary, and way of thinking of biologists.

  • (Dr. Rebecca Doerge) Trainees will study an epigenetic modification called DNA methylation, which plays a role in cellular differentiation and cancer development. ``Next-Generation Sequencing'' (NGS) technologies yield discrete count data, at single-base resolution, across the entire genome. With sodium bisulfite treatment (which causes changes to the DNA based on individual cytosine methylation status), NGS can be used to investigate DNA methylation. Students can perform Fisher's exact test for differences in methylation levels at every genomic cytosine. Using start/stop locations, students can essentially test every gene for differences in methylation levels. The dichotomy between cytosine-level and gene-level testing allows students to experience statistical issues such as data quality, variability, and multiple testing in large-data applications.

  • (Dr. George Moore) Our research and collaborations involve veterinary medical and veterinary public health data generated from Purdue's Veterinary Teaching Hospital, large veterinary practices, or commercial veterinary diagnostic laboratories. Projects for student involvement will include practical applications of medical dataset structure, handling missing patient data, appropriate statistical methods, and presentation of data/findings for veterinary clinical audiences and publication.

  • (Dr. Doraiswami Ramkrishna) In my research group, we have been developing mathematical models to describe metabolism since the 1980s. In doing so, we have developed our own theory to describe the metabolic behavior of cells. Our main goal is to compare our model predictions with high throughput bioinformatic data that represent intricate intracellular processes on a genomic level. A variety of technologies are equipped with the power to provide the needed quantity of data including microarrays, RNA-seq, and protein mass-spectroscopy. The overall goal of this project is the validation of a metabolic theory by means of extracting patterns from data. Looking for trends in the differential expression of genes in volumes of omic data--and comparing them with model predictions--presents the opportunity for the authentication of this model at the genome level. Approaches for analyzing high throughput bioinformatic data are diverse and extend to data mining, Bayesian statistics, and Markov Chain Monte Carlo analysis.

  • (Dr. Doraiswami Ramkrishna) Identification of conserved, ecologically meaningful, transcriptional responses in teleosts following oil exposure: RNA-Seq is a high throughput next generation sequencing technology that has developed into a power tool for ecotoxicology research, as it allows for rapid, accurate quantification of the transcriptome of non-model fish species following exposure to anthropogenic and environmental stressors. Importantly, several studies have linked transcriptome changes with endpoints of ecological significance (survival and growth). The goal of this project is to identify conserved transcriptional responses in teleosts following oil exposure. We have close to 150 RNA-Seq libraries from six fish species. Our specific aims are to: 1) Study gene responses in fish following oil exposure based on their habitat (pelagic: Red snapper, Red drum, Atlantic croaker vs. benthic: Southern flounder) and 2) Evaluate changes in gene expression after oil exposure in different life stages (embryonic, post-hatch, post-larval) in two estuarine fish species (Gulf Kllifish and Sheepshead minnow). By identifying the genes and signaling pathways that are differentially expressed after oil exposure in relation to species, habitat and developmental stage, it will be possible to better understand how fish respond to oil and enhance our understanding of the toxic mechanisms of action of oil in marine fishes.

  • (Dr. Maria Sepulveda) Water quality has a huge influence on fish physiology. Marine fish of course are healthier when raised in high salinity (30 ppt) water. However, it is cheaper and easier to raise marine fish in low salinity (< 5 ppt) conditions. This is important because we are relying more and more on hatchery raised fish for our consumption since most marine fish stocks have been depleted. We raised Florida Pompano, a marine fish, under low and high salinity conditions and noticed that some of the fish raised under low salinity conditions did very well while others got sick and died. We collected tissues responsible for osmoregulation (kidneys, liver, gills and gastrointestinal tract) from healthy and sick fish and conducted Next Generation Sequencing to determine differentially expressed genes in these two groups of fish. Specific objectives of this project include: 1) establish transcriptome libraries for gill, liver, kidney and gastrointestinal tract of Florida pompano reared in high and low salinities; 2) identify gene transcripts for osmoregulatory genes, key metabolic enzymes and stress response; 3) compare gene transcript abundance between Florida pompano reared in high and low salinities; and 4) discover unique sequences that may play key roles in the adaptability of marine fish to low salinity.

  • (Dr. Lyudmila Slipchenko) We develop a new polarizable force field BioEFP for modeling processes in biology, biomedicine and materials. Potential applications of BioEFP are in drug design, cancer research, bioimaging and photovoltaics. BioEFP is based on ideas derived from quantum mechanics and does not contain parameters fitted to experiment. Instead, parameters are obtained from electronic structure calculations on chemical fragments. The accuracy of the BioEFP force field is superior to the accuracy of common classical force fields. One of the main shortcomings of BioEFP is that the parameters are not readily available but have to be computed a priori. To overcome this obstacle, we propose to create an online repository of pre-computed fragment parameters and develop a similarity search algorithm that would ascribe each fragment of a biological or materials macromolecule to a pre-defined fragment. As a longer-term goal, we propose to interface a high performance computing (HPC) cluster with a web-interface such that missing parameters could be computed on-the-fly. We expect the fragment database will contain several thousands of chemically unique fragments; the amount of data associated with each fragment ranges from several Kb to several Mb.

  • (Dr. Jun Xie) Students will learn about statistical methods for large-scale genomic data analysis. Nowadays whole-genome genetics information is commonly available in disease studies and clinical trials. For example, genome-wide associate studies analyze a large amount of common genetic variants, i.e., single nucleotide polymorphisms (SNPs), in individuals to examine if any genetic variants are associated with a disease. Another example is pharmacogenomics research, which uses whole genome information to predict individuals' drug response. Students will learn about modern statistical methodology developed for these types of big data, including multiple testing rules, variable selection and dimension reduction methods. Students can learn hands-on experiences through statistical analysis of specific data sets from the databases of the National Center for Biotechnology Information (NCBI) at the National Institute of Health (NIH).

  • (Dr. Jun Xie) Students will participate in a project to conduct secondary data analysis and integration of existing datasets and database resources, to elucidate the genetic architecture of disease risk and related treatment outcomes. There are a huge amount of biomedical data, generated from numerous genomic studies. Leveraging these data through innovative analysis will help to better understand disease risk, progression, and treatment outcomes. Students will explore existing genetic or genomic datasets, e.g., the databases of the National Center for Biotechnology Information (NCBI) at the National Institute of Health (NIH), and perform statistical analyses of the existing data.

Research Thrust C: Healthcare Engineering and Healthcare/Biomedical Analytics
  • (Dr. Azza Ahmed) Ahmed's research is focused on developing and testing interventions that support and improve breastfeeding outcomes among vulnerable populations, specifically, preterm infants and low-income mother/infant dyads. She has been collaborating with the Indiana WIC program to study breastfeeding outcomes among late preterm and early term infants in a longitudinal study. She designed LACTOR, an interactive web-based breastfeeding monitoring system. She just finalized a randomized control trial to test the effect of LACTOR on breastfeeding outcomes with a large online dataset. Dr. Ahmed is also collaborating with Purdue Animal Sciences, Purdue Statistics, and Eskinazi Health, in a longitudinal study to test the effect of sleep quality during pregnancy on breastfeeding outcomes. She is also collecting data on peripartum depression, night eating habit and obesity.

  • (Dr. Azza Ahmed) Tele-Lactation Support Study: Dr. Ahmed is currently and her team is currently conducting a study to assess barriers and contributing factors in implementing Tele-lactation Support among Low-income Mothers. The long term goal of this study is to develop culturally sensitive, family centered lactation support intervention to increase access to professional lactation education and support and minimize racial breastfeeding disparity among low-income families. The primary goal of this proposed project is to assess the feasibility and acceptability of implementing tele-lactation support, in form of interactive web-based/mobile postpartum breastfeeding monitoring, tailored education and videoconferencing lactation support, to improve breastfeeding initiation, exclusivity (Feeding only breast milk) and duration among low-income mothers. We are aiming to determine the feasibility of using tele-lactation support after hospital discharge among low-income mothers, elucidate mothers' experience, perceptions and acceptability of using this innovative technology based intervention and determine the barriers to successful use of the intervention. We will also examine social and ecological factors associated with the use of this innovative technology and its effect on breastfeeding exclusivity and duration among this vulnerable population.

  • (Dr. Azza Ahmed) Recent studies have revealed the significant gab in the knowledge about breast milk donation through Milk banks and barriers to milk donation among low income mothers. This project is collaboration between Indiana Milk Bank and Purdue University School of Nursing to raise the awareness to milk donation among low income mothers. The main goal of this project is to increase the quantity of milk donation among low income mother through raising the awareness about benefits of milk donation to the infants, mothers, and community. The project will address the following Specific Aims (SA): SA1: To determine the characteristics of milk donors and ecological factors associated with milk donation, we will analyze the current Milk Bank database and provide a description of the donors' demographics, infants' gestational age, time of beginning milk donation, and quantity provided and assess factors associated with milk donation. SA2: Elucidate low income mothers' knowledge and perception about milk donation. A prospective cross sectional observational study using semi structured survey will be conducted among low income mothers who meet Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) offices income guidelines. SA3: Develop and test educational resources in form of brochures and presentations targeting pregnant women and their partners to raise the awareness of the benefits of breastfeeding and milk donation while addressing mothers' concerns and perceptions about milk donation. Presentations will be prepared also targeting health care providers such as nurses, lactation consultants, and staff of community health organizations. This project address health equity by increasing the chance for each premature infant to receive human breastmilk and increasing the knowledge about milk donation among low in come mothers which may increase the chance of improving breastfeeding continuation rate among this vulnerable population.

  • (Dr. Ulrike Dydak) Dydak's area of expertise is in Magnetic Resonance Imaging and Magnetic Resonance Spectroscopy. She also maintains a research lab at the Indiana Institute for Biomedical Imaging Sciences (IIBIS) at the Indiana University School of Medicine. She is currently working with colleagues in Biostatistics, Neurology, Toxicology and Psychiatry, designing and implementing clinical MRI/MRS studies. For instance, at present, they are working on finding significant effects in datasets that contain measurements of metabolite concentrations from different brain regions, and they correlate those measurements with biological measures, diagnostic groups, levels of environmental exposure and other measures.

  • (Dr. Haslyn Hunte) As Assistant Director of the Center on Poverty and Health Inequities (COPHI) at Purdue University, I work on reducing poverty-related inequities through partnerships with local communities. We study trends and problems such as insufficient access to food, barriers to treatment and health care inequalities, and inequities in policies, especially with regard to the poorer segments of populations. We believe that students benefit from seeing the full scope of the data analysis and policy research that we work on. For example, I believe that some of the Sophomore participants will appreciate (and perhaps even relate to) the variables I study as a part of a funded project of the health care safety net population. The specific aims of the research project are to provide insights to strategies that will 1) improve cost effective healthcare delivery and 2) reduce disparities in health outcomes for vulnerable populations by establishing new methods for the planning and operation of the safety net system. The research objectives: 1) Identify and map the locations of where individuals live and where they receive care within the core safety net provider system. 2) Determine whether bypass behavior is exhibited by patients for each episode of care they seek. 3) Determine the association among sociodemographic variables and care seeking behavior. To achieve our objectives we will utilize a data set with 69 million patient encounters over a five-year period from the Indianapolis Metropolitan Statistical Area.

  • (Dr. Haslyn Hunte) I am also engaged in more traditional social epidemiology research that would also provide opportunities for one or more mentees. Using several large datasets, I am interested in the following research questions: 1) What is the association between experiences of interpersonal discrimination and health behaviors and health outcomes? To what extent, if any, does racial/ethnic discrimination explain any of the observed racial/ethnic disparities in health related outcomes like tobacco and alcohol use/abuse, obesity and elevated blood pressure? 3) What is the relationship between positive psychological functioning and psychological challenges such as discrimination and how do they interact to produce the absence or presence of poor health. 4) Does the heterogeneity within the US Black population explain any of the observed Black-White disparities in health outcomes?

  • (Dr. Nan Kong) Acoustic Data Analysis for Characterizing Pet Dog's Behaviors: Behavior problems shown by pet dogs are considered to reflect their suboptimal physiological condition and social environment; however, an etiology of each behavior problem has yet to be revealed because the clinical population is likely heterogeneous. Our laboratory has studied about behavior responses as well as physiological variables of canine behavior problems. The current project will focus on vocal sounds of dogs with behavior problems to investigate if there are distinct acoustic patterns, which can help us infer different underlying emotional motivations in affected dogs. The funded student will use sound editing software, like Adobe Audition, to extract features from the voice files, and perform multivariate analysis, e.g., discriminant function analysis, and univariate analysis, e.g., ANOVA. The funded student will work closely with Drs. Kong and Ogata, Assistant Professor of Animal Behaviors from the College of Veterinary Medicine. The student will be assisted by Miss Carolina Vivas-Valencia, a PhD student from Dr. Kong's research lab.

  • (Dr. Nan Kong) Glucometer Data Analysis for Understanding the Impact of Activities and Interventions on Diabetes Management: A chronic disease is a medical condition that can last for a long period of time and can progressively cause disability, and even death. Many chronic diseases are caused by, or exacerbated by, multiple environmental features or behavioral factors, such as tobacco use, diets high in fact, and physical inactivity. Type-II diabetes mellitus is such a chronic disease for which we have little understanding on how environmental and behavioral variables can influence at the individual level. A group of Indian University Medical School researchers have engaged diabetes patients in a human subject study in which the diabetes patients' glucose data are continuously recorded together with the corresponding activity ("eating", "sitting", etc.) information entered by the experimental subjects. The funded student will perform wavelet based feature extraction on continuous glucose monitoring data and develop various classifiers to predict hypoglycemia events (extremely low glucose level). The funded student will work closely with Drs. Kong, and have chance to attend regular meetings with a cross-disciplinary group of researchers at the Indiana University for Aging Research Center (i.e., biweekly phone meeting and biweekly face-to-face meeting). The student will be assisted by Miss Carolina Vivas-Valencia, a PhD student from Dr. Kong's research lab.

  • (Dr. Nan Kong) Model-based Characterization of Age- and Gender-specific Colorectal Cancer Progression: One key prerequisite to personalized cancer medicine is to learn how a person's tumor grows. Based on this knowledge, doctors hope to find prevention, screening, and treatment strategies that may be more effective. In this project, we will learn how Bayesian statistics can help characterize colorectal cancer progression for each age- and gender-specific population group, together with an individually based state-transition disease model.

  • (Dr. Nan Kong) Comprehensive Mass spectrometry data analysis for proteome profiling and biomarker identification: Mass spectrometers have become promising instruments to acquire proteomic information. However, comprehensive proteome profiling and biomarker identification for disease diagnosis has fallen behind. In this project, we will learn how to conduct feature extraction, binary classification, and feature ranking on several mass spectrometry data sets.

  • (Dr. Nan Kong) Characterization of Medication Adherence Post Vascular Event: Adherence to preventive medications prescribed after vascular or cardiac events such as acute myocardial infarction (AMI), transient ischemic attack (TIA) or acute ischemic stroke is low and non-adherence has been associated with poor outcomes. Wireless technology and behavioral approaches have shown promise in improving health behaviors. Understanding how best to deploy these interventions for maximum impact is lacking, however. In this project, we will learn how parametric survival analysis can help characterize the behavior of medication adherence for a diverse group of people based on emerging data collected from smart pill bottles.

  • (Dr. Nan Kong) Feature Selection in Biomedical Image Analysis. Advanced imaging modalities (e.g., functional MRI and label-free imaging) are increasingly used for structural, functional, metabolic, as well as biological image analyses. Underpinning much of the research is the need to develop new methodologies that can extract useful information from very large databases. Methodological advances in biomedical data mining are expected to revolutionize the practice in many specialties of clinical practice. Among the various developments, an important task is to extract features that can be used to differentiate/label subjects with respect to identified structural, functional, metabolic, or biological features. In this project, we will investigate feature extraction tasks for two types of data, fMRI and photoacoustic data. The funded students are expected to work closely with Dr. Kong and periodically meet with two BME professors Dr. Ji-xin Cheng, an expert on optical spectroscopy, and Dr. Zhongming Liu, an expert on fMRI.

  • (Dr. Nan Kong) Fall Detection using Time-Series Data. Falls are a common problem for the elderly, often resulting in hospitalization. Despite extensive preventive efforts, falls continue to be a major source of morbidity and mortality among elderly. Real-time detection of falls may enable rapid medical assistance, thus increasing the sense of security of the elderly and reducing some of the negative consequences of falls. In this project, we will analyze temporal series of 3D accelerometer data collected on simulated falls performed by healthy volunteers. The objective of the project is to develop fall detection algorithms and conduct comparative studies. The funded student is expected to work closely with Dr. Kong and periodically meet with Dr. Shirley Rietdyk, Professor of Health and Kinesiology.

  • (Dr. Mark Lawley) The first project involves a large data set with 30 years of patient data on outpatient appointments, emergency department usage, hospitalizations, and laboratory results. The students would need to first understand the nuances of working with de-identified medical data, confidentiality, HIPAA guidelines, etc. The intended outcome of the project would be a set of models for predicting the cost and health impacts no- behavior (failing to attend a scheduled medical appointment) for chronically ill patients. Our past work has shown a strong correlation between no-show behavior and increased use of hospital resources, but we need additional work to better explore and understand these relationships. Because we are all users of the U.S. healthcare system, this is an important problem with which the students can easily relate. Further, it introduces them to a number of important statistical techniques in a practical, concrete way in a context that they can appreciate.

  • (Dr. Mark Lawley) Another project involves diabetes. Students should, once again, relate to the context of this problem since they will almost certainly have relatives or close acquaintances afflicted by diabetes. The management of diabetes requires a careful balancing act. Patients that have chronically high glucose levels (hyperglycemia) risk long term damage to the kidneys, heart, eyes, and feet. On the other hand, over-control of glucose levels can lead to short term glucose levels that are far too low (hypoglycemia), which can cause dizziness, incoherence, fainting, and other problems. We are in the process of obtaining a large data set on diabetic patient glucose levels which we will use to study this problem of short and long term risk balancing. The students could help with time series analysis and learn about some of the simulation and optimization techniques we intend to use in the work.

  • (Dr. Laura Prouty Sands) My training in multivariate modeling and psychometric analysis of survey instruments combined with my 25 years of research in health outcomes research reveals that I have the content expertise to effectively mentor undergraduates interested in learning how to analyze and interpret healthcare practice and policy relevant databases. My research is focused on determining optimal care pathways for vulnerable older adults. Currently I am funded by two NIH grants. The first assesses risks for post-operative cognitive decline among older surgical patients. My role on that project is to develop the methods for detecting post-operative cognitive decline and to supervise analyses of project data.

  • (Dr. Laura Prouty Sands) The second project is directed toward determining health outcomes of unmet need for disabilities among older adults using survey and Medicare claims data. I have mentored eight Ph.D. students from the Department of Statistics, as well as two undergraduate Statistics students. I have access to a wide range of datasets related to healthcare practice and policy, e.g., the Interuniversity Consortiun for Political and Social Research (ICPSR), as well as data from the Centers for Medicare and Medicaid and the Centers for Disease Control and Prevention (CDC).

  • (Dr. Cleveland Shields) In my lab, we study physician-patient communication. We have several datasets that we are analyzing. Students in the Statistics Learning Community would have the opportunity to help generate research questions and run statistical procedures on the data.

  • (Dr. Cleveland Shields) Every semester, we have 4 to 6 undergraduate students working in our Relationships and Healthcare Lab, which I co-direct with Dr. Melissa Franks. Thus, we have considerable experience integrating undergraduates in research projects. We can involve students in analyzing data on three projects. First, students can work with data analysis for a project Dr. Franks and I are conducting of hospital readmission of patients with diabetes examining discharge planning and family involvement.

  • (Dr. Cleveland Shields) Second, colleagues and I in the Regenstrief Center for Healthcare Engineering (RCHE), are conducting a study of health services utilization to identify geographic locations producing high utilization and costs using longitudinal data from medical records from St. Vincent Hospital Systems in Indianapolis area. Students could help with the design and conduct the analysis for this project.

  • (Dr. Cleveland Shields) Finally, I am conducting a field experiment examining physician-patient communication. We will be gathering 240 audio recordings of interactions between physicians and actors who will be portraying a patient role. Dr. Sharon Christ serves as the statistician on this project. We will be conducting psychometric analyses to understand the underlying constructs in the measurement of communication in these medical encounters. This presents a context that is surprising and is likely to pique students' interests.

  • (Dr. Lingsong Zhang) Zhang is working with Lawley (Texas A&M) and Sands (Virginia Tech) on statistical modeling of patient ``no-show''. They are collaborating with Alliance of Chicago, using scheduling data and electronic medical records from 7 clinics over 3 years. The focus is on diabetic patients, who visit regularly. They want to involve students in using scheduling history and demographic factors (payer class, income, education, age) to predict no-show probability and uncertainty.

  • (Dr. Lingsong Zhang) Zhang is also working with S. Witz and K. Musselman (from the Regenstrief Center for Healthcare Engineering, RCHE), H. Wan from Purdue Industrial Engineering, and J. Castro from University of South Florida, on analysis of hospital readmission characteristics and prediction. RCHE is working with BayCare Health System (Tampa, FL) and St. Vincent Health (Indianapolis, IN), using multiple-year discharge data and patient characteristics, to identify (1) readmission probability upon discharge, (2) clinical/demographic factors associated with readmission, and (3) performance comparisons of hospital readmissions.

  • (Dr. Lingsong Zhang) Analysis an employer medical insurance data: At the Regenstrief Center for Healthcare Engineering, there is a claim data set from one of a self-insured employer for more than 3 years. The employer provides up to three insurance plans that different percentage of copay and deductible associated to each policy. The medical claims for all employees are available in the data along with other useful demographical and health information. We intend to investigate this data set on two projects: whether there is a trend in insurance claim over time, and whether the plans can be better designed to reduce both the employee and employer's cost.

  • (Dr. Lingsong Zhang) Surgery room gesture design and identification: Collaborated with faculty members from industrial engineering and Indiana university medical school, we investigate the possibility of inventing useful gestures that can be used in neurosurgeries, to avoid possible hygiene problems of using computer in the operating room. We have designed 34 useful gesture sets, within each set there are 9 candidate gestures. The project is to compare all the gestures and rank among them, and select the "optimal" gesture for further usage. In this project, we will start from other gesture sets to develop approaches to identify profile difference between gestures.

Research Thrust D: Probability and Theoretical Statistics
  • (Dr. Guang Cheng) Students will learn resampling using bootstrap methods. The bootstrap is widely applicable for inference in massive data; however, bootstrap is computationally demanding. Kleiner et al. introduce the Bag of Little bootstrap (BLB): a robust, computationally efficient means of assessing the quality of estimators; it combines the results of bootstrapping multiple, small subsets, on parallel computing architectures. G. Cheng proposes to investigate with students whether the application of the m out of n bootstrap or subsampling in each subset in the BLB bootstrap will overcome the inconsistency in the bootstrap. Longer term: he wants to study BLB under the settings of M-estimation with a student, e.g., its consistency, asymptotics and computational efficiency in dealing with massive data.

  • (Dr. George McCabe) Responding to increased levels of obesity: lessons from three countries. This project will investigate how people choose what they will ear by reviewing the literature on the anthropology of food, examining the importance of price and marketing strategies, and considering how women's engagement in work outside the home is affecting the production of home-cooked meals and reliance on pre-prepared and restaurant meals. Statistical issues include the analysis of data from national surveys in the three countries (US, France, and India) and a meta-analysis of anti-obesity interventions.

  • (Dr. Raghu Pasupathy) Motivated by contexts such as air quality measurement using cheap sensors, energy monitoring through smart meters in large buildings, and tracking stock tickers on mobile devices, we ask: Is existing statistical and simulation methodology adequate for online "big data" contexts? How should methods for estimating traditional statistical measures, e.g., quantiles, conditional value-at-risk adapt to the online context? Are there low-storage, fast-compute versions of function estimators, e.g., kernel densities, stochastic kriging, that are just as accurate as existing estimators? Students will help to construct and analyze O-estimators --- a new class of estimators characterized by (provably) minimal storage and update complexities, and having convergence rates matching those of analogous traditional statistical objects.

  • (Dr. Ilya Pollak) An important area of image processing that I work in is segmentation, i.e., developing computer algorithms for automated detection of object boundaries in images. This is a critical image analysis step in problems arising in many areas, such as biomedical imaging, computer vision, and microscopy of materials. The analysis of such algorithms requires statistical comparisons of the algorithms' outputs on a large image database with ground-truth segmentations. Constructing ground-truth segmentations and writing basic utilities for such comparisons (in R or in Matlab) would be a great sophomore research project.

  • (Dr. Ilya Pollak) In the area of quantitative finance, there are a number of recent papers on the analysis of so-called technical indicators. A very interesting sophomore research project would be to read one of these papers, implement (again, in R or Matlab) several indicators described therein and conduct statistical analysis of forecasting performance of these indicators on real market data.

  • (Dr. Jianxi Su) Current regulatory frameworks require enhanced techniques for measuring and managing extremal risks of financial enterprises. In particular, this involves analyses of standalone risks and dependence structures among them. Su's research aims to develop analytically tractable and practically interpretable quantitative risk management tools to analyze the dependencies among actuarial/financial risks. In this project, students will adopt a novel class of full-range tail dependence copulas to model large volumes of financial data.

  • (Dr. Xiao Wang) Wang is currently working on projects related to statistical computing, spatial statistics and image analysis. Specifically, Dr. Wang is developing deep learning methods for neuroimaging data. For example, one of the studies is to use the predictive value of ultra-high dimensional imaging data and/or other scalar predictors (e.g., cognitive score) for clinical outcomes including diagnostic status and the response to treatment in the study of neurodegenerative and neuropsychiatric diseases, such as Alzheimer's disease (AD). The growing public threat of AD has raised the urgency to discover and validate prognostic biomarkers that may identify subjects at greatest risk for future cognitive decline and accelerate the testing of preventive strategies. In this regard, prior studies of subjects at risk for AD have examined the utility of various individual biomarkers, such as cognitive tests, fluid markers, imaging measurements, and some individual genetic markers (e.g., ApoE4 gene), to capture the heterogeneity and multifactorial complexity of AD (reviewed in Weiner et al. 2012). He wants to include more undergraduate students in this project.

  • (Dr. Xiao Wang) During the past decade, deep learning has demonstrated a great potential in solving many complex artificial intelligence tasks such as pattern recognition and speech understanding. Dr. Wang's current research focuses on understanding complex and high dimensional data structure using deep learning. The past sophomore undergraduate student projects include: 1) A neural network approach to real time bidding for advertisement click prediction; 2) Real time car detection for road traffic. Dr. Wang is also working on projects how to adopt deep learning to healthcare analytics, in particular, to approximate complex biological systems with high dimensional complex biomedical data.

  • (Dr. Mark Daniel Ward) Students in Ward's research group will analyze asymptotic properties of randomly-generated sequences and trees using probabilistic generating functions, and simulations in R, as well as some Maple, for solving recurrences and deriving asymptotics. Undergraduates can also work with Ward on stochastic leader election algorithms or on data-driven problems in game theory.

Research Thrust E: Human Development and Family Studies
  • (Dr. Edward Bartlett) Bartlett's research area is sensory neurophysiology. The research focus is to dissect the neural circuits involved in the neural coding of sound features across the lifespan, from early development through adulthood and age-related decline. Neural data are obtained from recordings of single neurons and neural populations in response to speech-like and simple sounds. In addition, realistic computational models of single neurons or small groups of neurons are constructed to understand the data.

  • (Dr. Sharon Christ) Christ can guide students on analysis of large survey data collected from people. One sample is representative of the children involved with Child Protective Services in the U.S. and the other is representative of adolescents in grades 7-12 in the United States during the 1994-95 school year. These data are longitudinal and involved complex sampling such as clustering (non-independence) and unequal selection probabilities. As a result, trainees will learn how to apply probability weighted estimation and variance estimates that are robust to clustering. In addition, these surveys suffer from missing data and measurement errors due to self-reported nature of the data collection. Trainees will learn modeling approaches used to avoid biases due to missing data and measurement error. The statistical analysis will be applied to the study of the effects of maltreatment on adolescent development.

  • (Dr. Sharon Christ) Christ will work with Weber-Fox and her students on the sample of children observed in her audiology lab. In this study, they will work on modeling changes in stuttering over time, and what characteristics are correlated with persistent versus desisted stuttering.

  • (Dr. Sharon Christ) For another study, students can use the National Health Interview Survey (NHIS) to evaluate how occupations are related to smoking, alcohol use, exercise, asthma, heart disease, etc. NHIS is the national data set used to survey the U.S. adult population with respect to health.

  • (Dr. Sharon Christ) For another study, students can use time-series analyses to study sleep patterns in children, especially children diagnosed with autism spectrum disorders.

  • (Dr. Elliot Friedman) Some of our work involves the use of data from large, nationally representative survey-based studies, and a perpetual issue with such studies is missing data. In some cases these data are missing randomly (e.g. people skipped a question by accident), and in some cases it may not be random (e.g. possible reluctance to answer questions about income or more sensitive topics). Students will have the opportunity to look for patterns of missing data to determine whether they are random or systematic. They will also devise appropriate strategies for imputing missing values in order to increase the power and reliability of analyses based on these data.

  • (Dr. Elliot Friedman) My work involves the use of health-related data from large, nationally representative survey-based studies. These studies are powerful tools for being able to ask broad questions about population health, but they also present a variety of challenges. Statistics students working with me will have a variety of opportunities to conduct research, ranging from taking on some of these challenges (e.g. how to handle and adjust for missing data) to learning and applying advanced statistical techniques to research specific questions that are best addressed with these kinds of data.

  • (Dr. Lisa Goffman) I study specific language impairment (SLI) in children. Children with SLI show cognitive abilities within normal levels, but significantly impaired language abilities. Although their cognition is typical, it has recently been found that these children also often show impairments in their gross and fine motor skills. Because children with SLI are at risk for long-term social and academic difficulties, there is a critical need to understand the factors underlying their language and motor deficits and to develop efficacious approaches to treatment. In my NIH funded research, we are presently conducting a longitudinal study of children with SLI to better understand how their language and related motor skills change from the preschool to the school age years. We include standard language and motor measures as well as direct recordings of speech and limb movement. Our goal is to better understand how language and motor domains develop in these children, and how they change over the critical early school years.

  • (Dr. Christine Weber-Fox) My work is in neural systems for language processing in typical development and in those with communication disorders such as stuttering or language impairment. I also have clinical experience working in both hospital (outpatient, inpatient, and acute care) and school settings. The motivation to study language processing and its connections to stuttering is apropos for sophomore students, who can readily understand the reasoning and context for why this context is important. (The recent success, for instance, of The King's Speech demonstrates that this is a topic of broad concern and interest.) My work also focuses on how neural subsystems may differ in speakers with different language experiences and communication skills, as brain processes for language vary even across individuals with 'normal' language abilities. The type of data to be analyzed in our research group include behavioral/clinical measures from children, such as cognitive test scores, including nonverbal IQ and working memory, as well as detailed measures of their speech and language performance. In addition, our data set includes physiological measures of brain activity (Event-related Brain Potentials, ERPs). As the student becomes familiar with the research goals and methods, the expected outcome for a statistics sophomore student is for them to help manage and analyze large data sets that cross domains (behavioral, electrophysiological) and span longitudinally from 4-9 years of age.

  • (Dr. Ellen Wells) The Deep Green and Healthy Homes project was sponsored by the nonprofit Environmental Health Watch, in Cleveland, Ohio (PI: Stuart Greenberg). It compares two standards of energy efficiency renovations in low-to-moderate income housing in Cleveland, Ohio. Six homes were renovated to standard energy efficiency recommendations (~50% energy savings); 6 additional were renovated to a stricter standard (~75-90% energy savings) and included mechanical ventilation systems to help preserve air quality. Homes were monitored just after renovation and for ~ 1 year following renovation using new indoor air quality technology, and home visits were conducted every three months to conduct visual inspections, take indoor air quality measurements with field instruments, and collect data from participants via questionnaire. The new indoor air quality monitoring technology incorporates low-cost gas /temperature /relative humidity sensors into a single platform which wirelessly transmits data from the field site to our servers twice/minute. Six parameters are included in the sensors: temperature, relative humidity, CO, CO2, NOx, and total VOCs. We developed calibration equations which incorporate data from all sensors within the unit (the sensors will respond, somewhat more weakly, to a gas similar in structure to its target gas). For most homes we collected more than 1 million rows of data. Potential projects using these data include: Further methodological analysis of calibration/data transmission from the new technology; Correlation of data from new technology compared to standard field instruments; Comparing the two renovation types with regards to air quality or energy use; Description of continuous data patterns from remote monitors on a daily/weekly/etc. scale; Description of data before/during/after an event which would affect air quality (i.e., ventilation system not working, dispersal of an enormous amount of mothballs, etc.); Analysis of how occupant behavior may affect energy use/indoor air quality.


Research Thrust F: Statistical Consulting Service
  • (Dr. Bruce Craig and Ms. Ce-Ce Furtner) The SCS has over 200 research consultations/year, serving clients from every College in the University. Any Purdue faculty, staff member, or student can be a client, for free, to receive statistical consulting and advice. Consultants help with proposal preparation, design of studies, data import/export, data analysis, and interpretation and presentation of results. For funding reasons, the SCS only employs grad students, but Director Craig is willing to involve undergraduates from this MCTP project. C. Furtner (Manager), former UG academic advisor, knows what is feasible for undergraduate students. Consulting will: lead UG's to apply for graduate study in Applied Statistics; boost communication skills; and sometimes result in papers with clients. Undergraduate students will attend meetings---led by grad student consultants---and will help with the data analysis. Listening at meetings, UG's will get an early, tangible understanding of how modeling, time series, design of experiments, etc., are used in practice.

Research Thrust G: Coastal Margin Observation & Prediction
  • (Dr. Tawnya Peterson and Dr. António Baptista) (Please note that the summer component of this thrust is in Portland, Oregon, and would require summer travel.) The NSF Science and Technology Center for Coastal Margin Observation & Prediction (CMOP) is dedicated to the study of estuaries as bioreactors that deliver unique ecosystem services, including the filtering of land inputs into the ocean. We use the Columbia River estuary as our long-term testbed, and we support our research through continuous high-resolution observations and simulations of a vast array of multi-disciplinary variables. Diverse opportunities for statistical analysis of the data are available for undergraduate students, in association with understanding physical and ecological processes, assessment and control of the quality of observations, and assessment and improvement of computational models. These opportunities are available during the summer or---by special arrangement---throughout the year. Because of the inter-institutional and inter-disciplinary nature of CMOP, students can work with leading scientists at three universities: Oregon Health & Science University, Oregon State University and University of Washington.

Research Thrust H: Saving Nature with Statistics
  • (Dr. Songlin Fei) Forests provide a wide variety of vital services such as timber and clean water, but they are challenged by the changing climate. Our lab strives to understand the impact of climate change on forests and the resulting impact on future climate. We use continental-wide, long-term (1980-present) data collected by the US Forest Service to understand a set of questions such as: Are trees migrating and at what rates? How are recruitment and growth of trees affected? What are the consequences of climate-induced species composition changes? Students can learn how to explore and analyze large data in a spatial and temporal context.

  • (Dr. Songlin Fei) Invasion of exotic plant species has caused serious ecological degradation and economic losses. Our lab is working to build predictive models to understand regional invasion patterns and processes that will advance the discipline of invasion ecology and assist in effective management policy and control practices to combat invasive species. We are interested in understanding a set of questions including: (1) Why are certain exotics more invasive? (2) Why are certain ecosystems more prone to invasion? (3) What are the main factors facilitating invasion? Students interested in this topic can use continental-wide (or subset of) invasion databases to explore these or related questions. Students will learn how to manage large datasets and practice model fitting, multivariate analyses, spatial analysis, etc.

  • (Dr. Songlin Fei) Hellbenders, a gigantic, aquatic salamander species found in North America, are declining throughout their range. In Indiana, hellbenders are now confined to a single watershed. In an effort to aid hellbender conservation and management in the state, we are developing local habitat models for hellbenders. This project will involve using classification techniques on large volume of sonar data to develop a substrate map of the study river. The student will be working with a graduate student and a pre-collected data set to come up with novel statistical classification techniques to map river bottom substrate, which will then be used as predictor variables within hellbender habitat models.

  • (Dr. Rob Swihart) Wildlife populations and communities are subjected to human influence in innumerable ways, including activities (e.g., hunting) that have direct effects and others (e.g., timber harvest, agriculture) for which effects may occur primarily due to changes in the availability or quality of habitat. My group seeks to understand how wild vertebrates respond to human activities, as this knowledge can be important to minimizing adverse influences. We have conducted work in the Upper Wabash Ecosystem Project and the Hardwood Ecosystem Experiment, which has resulted in large data sets on dozens of wild species (mostly mammals, but also birds, amphibians, and reptiles) and associated covariates for habitat and landscape features. These data are used to address questions such as: (1) How does the intensity of human disturbance affect population abundance and species composition? (2) What makes some species more sensitive than others to human disturbance? (3) What role does spatial scale play in determining wildlife responses? Students can learn how to conduct exploratory analyses and test competing hypotheses using general and generalized linear models.

  • (Dr. Rob Swihart) Successful management and conservation of wildlife depends on understanding the factors (e.g., extreme droughts, disease epidemics, or habitat change) that drive variation in survival and reproduction. Unfortunately, a factor's importance often appears only infrequently or slowly, which requires long-term data sets. Wild mammals are difficult to study, so long-term data sets are rare. For species of game mammals, long-term data sets from unexploited populations are even rarer, despite the fact that an understanding of population dynamics in the absence of hunting is essential. I have inherited a data set collected by students at the Purdue Wildlife Area on a non-hunted population of eastern cottontail rabbits that spans 33 years. These data will be used to ask questions such as: (1) How does climate change influence density and survival of cottontails? (2) Have changes in the plant community over time had measurable impacts on the cottontail population? (3) What influence has the increase in abundance of coyotes, an important predator, had on cottontail numbers? Students can learn how to conduct exploratory analyses and test competing hypotheses using general and generalized linear models.

  • (Dr. Bryan Pijanowski) Work in our Center for Global Soundscapes focusses on the use of long-term soundscape recordings to assess the health of ecosystems around the world. Recently featured in Science as a new area of big data research, soundscape ecology has mushroomed into one of the fastest growing ecological sciences, using advanced sensor and sensor network arrays that combine acoustic information, 3D landscape profiles using LiDAR (light detection and ranging) along with companion time-lapse photography/4K video imaging to characterize the dynamics of a variety of ecosystems around the world. Large-scale soundscape and remote sensing databases exist for exotic places like Borneo (paleotropics), Costa Rica (neotropics), Sonoran Desert (Arizona), Midwestern temperate forests (Indiana, Chicago and Wisconsin), estuaries (Maine) and the subarctic (Alaska). Students could work on any of the following projects mentored by both a graduate student and postdoc: (1) analyze over 70 TB of soundscape data from different ecosystems comparing the spatial-temporal dynamics of these systems; (2) develop multi-media web components for use in citizen science and K-12 learning of sound, ecology, mathematics and technology (enhancing our site at www.globalsoundscapes.org); (3) developing new soundscape ecological metrics that quantify diversity of sounds in files using principles of entropy; and (4) develop new techniques for data mining and pattern recognition using novel statistical tools.

  • (Dr. Patrick Zollner) Our lab's research efforts focus primarily on the ecology of mammals. One approach we use is to deploy infrared remote cameras to capture pictures of and gather data about the mammals of interest. Each such camera typically records thousands of photos at each location and results in challenges associated with a large volume of data. Research ongoing in our lab is collecting such photo data on the occurrence of carnivores (otter, mink and raccoon, etc.) at different locations along rivers in Indiana. These cameras record species activity with both a spatial and a temporal component, and the resulting data provide opportunities to investigate numerous potential projects of interest. For example, this data can provide a basis for developing models of competition between species as a function of both spatial activity patterns and/or daily activity patterns. Alternatively, data from these cameras can be used to examine how species activity patterns and interactions vary as a function of environmental variables (e.g., temperature, precipitation, moon phase or the presence of invasive Asian carp in rivers at some sites). We will work with a student interested in this project to define a feasible and unique question they can investigate using this data set.

  • (Dr. Patrick Zollner) White-nose Syndrome (WNS), a disease caused by a novel fungal pathogen, has devastated bat populations in the eastern and midwestern United States. The effects WNS have led to the listing of the northern long-eared bat (Myotis septentrionalis) as a federally threatened species. There are large gaps in our understanding of northern long-eared bat habitat use and conservation, and in light of WNS it is increasingly important to understand these relationships in order to protect this once common species. This project will begin by using presence-only occupancy modelling to estimate how landscape-scale environmental variables are associated with northern long-eared bat roosting and foraging habitat from historical locations collected prior to the outbreak of WNS. We will then use a combination of acoustic detectors, bat capture, and radio-telemetry tracking of captured bats during the summers of 2017 and 2018 to determine where these bats remain following their dramatic population declines. Another important comparison we will make will be between habitat used by these bats in fragmented landscapes of northern Indiana relative to similar data from more contiguous forests of southern Indiana. A student helping with this project will have the opportunity to assist with collecting data on bats in the field as well as to help with analyses focused on comparing models of habitat used by these bats in different circumstances. This student's own project could develop from the above ideas or related side projects such as studying the effectiveness of acoustic lures at increasing the probability of capturing these bats.

  • (Dr. Patrick Zollner) White nose syndrome is an invasive fungal species new to North America that has caused the death of more than 90% of the individuals of several species of cave hibernating bats throughout the Midwestern US. Our lab is collecting and analyzing acoustic monitoring data on the occurrence of these now threatened and endangered bat species to use in modelling summer habitat needs of these species. The acoustic bat detectors we are using record large volumes of bat echolocation calls, and we have access to such data from several regions of Indiana both before and after the arrival of the destructive fungus. A student working on this project would use these acoustic records to evaluate and validate the suitability of models we have developed from similar but independent observations. The improved habitat models resulting from this work will have important applications in the conservation of these bat species as well as the management of Indiana's forests.

  • (Dr. Michael Saunders) What effects does timber harvesting have on forest ecosystems? - The Hardwood Ecosystem Experiment (HEE) in southern Indiana investigates the influence of forest management on various plant and animal communities within oak-dominated ecosystems. The HEE maintains a large geospatial database with repeat inventories of trees and shrubs, terrestrial vertebrates (see also "topic 4" from Dr. Swihart), and insects (see also "topic 12" from Dr. Holland). Initially, this project would evaluate the effects of forest harvesting of tree and shrub communities, but could be extended to work on other taxonomic groups. There would be opportunities for travel to the sites and to present the results to regional conferences.

  • (Dr. Michael Saunders) How do trees grow wood? - Production ecology has been generally well studied in conifer tree species, but not in hardwood tree species. Theoretical relationships developed in conifer-dominated forest stands may or may not apply to our Indiana hardwood-dominated forests. This project will use a vast database of tree heights, diameters, and stem taper to model how walnut plantations grow. We will investigate relationships between the amount of leaves a tree displays and the amount of wood that tree produces each year. We will also study how manipulations of leaf area through pruning affect growth. There will be opportunities for the work to be extended to American chestnut, oak and other hardwood species.

  • (Dr. Tomas Höök) Research in our lab focuses on aquatic ecology and, in particular, the dynamics of the Laurentian Great Lakes. Given that each of the Great Lakes is quite large, they are governed by almost oceanic scale physical processes and characterized by high spatial variability of physical features and biotic factors. Spatial description of variable biotic factors (e.g., fish distributions, growth rates) can contribute to hypothesis development of processes structuring biotic dynamics, while spatially comparing such biotic factors with other physical, chemical or biotic variables can evaluate hypotheses and potentially help identify mechanistic linkages. The objective of this research would be to a) describe spatial patterns of fish distributions and growth in Lake Michigan and b) relate these patterns to physical (e.g., satellite-derived surface temperature and water clarity) and biotic (e.g., chlorophyll concentrations, zooplankton densities) factors.

  • (Dr. Tomas Höök) Northern Indiana contains ~450 natural lakes that provide diverse services, from boating to swimming to fishing to flood control. However, these services are often at odds with land-use practices and human activities. In particular, nutrient runoff (primarily phosphorous) from row crop agriculture to surface waters contributes to eutrophication, including harmful algal blooms, hypoxia, and local extinctions of sensitive species. We have access to a wealth of data from GIS databases and state and university monitoring programs related to fish community composition, water quality, lake morphometrics and land-use on the land draining into Indiana's natural lakes. The objective of this research would be to quantitatively model linkages among these different types of variables; for example, evaluating how agricultural practices on lands draining into glacial lakes influence water quality, habitat conditions and resulting fish biodiversity within these lakes.

  • (Dr. Jeffrey D. Holland) Students in my laboratory study how the pattern of land use and habitat in different landscapes influences ecological processes involving insects such as individual dispersal and exotic species invasion, ecosystem services (e.g., pollination, predation of pests, decomposition), and maintenance of biodiversity. We simultaneously study the impact of local habitat and human activities and the larger scale landscape context. To study the landscape - insect link, we make use of extensive field surveys of insects and habitat combined with satellite and aerial data, geographical information systems, and spatial & multivariate statistics. A sample of projects students could become involved in includes: examining the impact of silvicultural regimes on the functional diversity of forest beetles, spatial analysis of aquatic insect communities, and simulation modeling of insect movement across landscapes.

  • (Dr. Esteban Fernandez-Juricic) Birds and airplanes collide regularly at airports across the world. These bird-strikes are a source of mortality for many species of conservation concern as well as a safety and financial concern for the airline industry. In the US, the Federal Aviation Administration has compiled a database of bird-strikes since 1990. We use this large database to answer key questions to better understand the environmental conditions that enhance the occurrence of bird strikes: (1) Are airports close to biodiversity hot-spots more likely to have a higher frequency of bird strikes involving species of conservation concern? (2) Does landscape composition around airports influence the probability/frequency of bird strikes? (3) Does habitat structure within airports influence the probability of bird strikes? (4) What is the role of regional and local bird densities in affecting bird strike frequency? (5) Does the color, speed, shape of commercial airplanes influence the probability of bird strikes? Students will learn how to manage large databases and use general and generalized linear mixed models as well as multivariate statistics. The answers to these questions have widespread management implications to reduce the frequency of bird strikes.

  • (Dr. Esteban Fernandez-Juricic) Bird feeders are used by multiple species of birds throughout North America. However, bird feeders are not necessarily built to attract birds (for instance, to hold seeds some use Plexiglas, which blocks the ultra-violet portion of the spectrum that many bird species use to find food visually). We have developed a behavioral assay to test in aviary conditions novel bird feeders (different shapes, colors, etc.) designed taking into consideration the avian visual system, which is very different from our visual system. Through these assays, we have collected data to determine the combination of features that would increase the chances of bird visitation and seed consumption. Students will learn how to run these behavioral assays and use general and generalized linear mixed models to analyze the data. The results of this project have implications for increasing bird diversity in urbanized landscapes.

Research Thrust I: More Applications
  • (Dr. Dennis Buckmaster) Management Zone Identification: In crop production systems, fields are now often subdivided into management zones based on past yield performance and key traits such as soil texture, soil particle sizes, drainage class, depth of topsoil, and topography. Good management zone identification results in tight distributions of performance within a zone so that the ideal seed, population, fertilizer, etc. can be applied. The research project is to explore different traits which result in "tight" management zones of reasonable size with consistency. This particular project deals with several layers of geo-referenced data as well as time-series data which should be cross-referenced with weather data.

  • (Dr. Dennis Buckmaster) TrialsTracker - analytics for cropping systems: The Purdue Open Ag Technology and Systems Center has developed TrialsTracker as a webapp capable of analyzing georeferenced data according to tags/flags/traits. The app has some unique data management attributes which enable operation on mobile devices across large differences in scales and zoom levels. By this, we mean that A user can identify polygons and rapidly determine statistics and differences between the data from this region and other similar/different regions (aggregate by finger). The tool could use some assessment and upgrades in the analytics tools as well as the presentation of the data and results to users. An interesting angle to take would be to identify data quantity/quality needs to draw certain conclusions.

  • (Dr. Dennis Buckmaster) Analysis of autonomous agricultural vehicles: As part of collaborations with industry, the Open Ag Technology and Systems Center will have access to large data sets of machine operations in manual/conventional systems as well as in autonomous operations. The charge would be to evaluate and assess the differences between the systems so as to justify the additional expense of autonomous machinery. The data sets would be georeferenced and include aspects such as fuel consumption, engine load, productivity, labor required, crop performance, timeliness, and uniformity of work. Interesting questions regarding extent of data needs to make confident statements will most certainly arise.

  • (Drs. Dennis Buckmaster and James Krogmeier) With access to soil, weather, crop yield, and topography data, students can pursue spatial interpolation and relationships between layers of crop and climatological data. Pursuit of important contextual data can help improve efficiency of production which is good for food, fuel, feed, and fiber prices as well as beneficial to the environment. Agriculture is a bit late to the big data boom, and potential impact is great. Opportunities to improve methods for analyzing geospatial agricultural data abound.

  • (Dr. Dennis Buckmaster) USDA National Ag Statistics Services offers data regarding production and economics of agricultural commodities. Exploration of functional relationships between gross production and regional climatological data might lend insight regarding key management decisions and possibly the effects of climate variability. Creative thinking and model development with data from different sources will be the challenge. If we can gain access to reasonably large quantities of field specific data, validation with data sets representing smaller areas would be the goal.

  • (Drs. Dennis Buckmaster and James Krogmeier) The next wave of improvement in agricultural labor productivity will be driven by the introduction of autonomous machines. Current innovations in precision agriculture were leveraged by the availability of auto-steer, which directly enhanced labor productivity in field operations and paid for the communications, computing, and control systems which were later used to enable other functions. To realize the labor-saving benefit of autonomous machines, it will be necessary that they autonomously navigate roadways. This necessity comes with its own set of hazards that do not show up in the field. As a small but important step on the road map to full ag machine autonomy, this paper will investigate how fixed rural road hazards may be mapped via crowd-sourcing and machine learning. To this end we propose a mobile phone "slow moving vehicle" app that uploads its location to the cloud when moving slowly on rural roads. Looking at data collected by this app automatically we will be able to determine the location of fixed hazards on the road. Using crowdsourced data from the app we classify and filter hazards as fixed or moving. By looking for anomalies in the GPS track data such as deviations around fixed road hazards we could determine locations for mailboxes, signs, and utility poles. The app will also broadcast the location of the machine to the phones of surrounding motorists as an alert. Additional hazards include bridges, low hanging power lines, and other vehicles. Both the system architecture for the hazard app and the machine learning algorithms for track anomaly detection will be described.

  • (Dr. Vetria Byrd) Big Data and Uncertainty Visualization: This project has applications under visualization, uncertainty and Big Data. Students learn the importance visualization and the role it plays in discovery and scholarship, and explore real-world datasets in an environment capable of viewing and manipulating Big Data sets. The project will allow for the exploration and training in uncertainty visualization. Students will work with publically available data including: climate datasets from the National Oceanic and Atmospheric Administration website, health data website, the US Census Bureau, United States Department of Agriculture, to create data visualizations that will aid in decision making.

  • (Dr. Vetria Byrd) Making sense of Big at Multiple Levels of Abstraction: This project will explore the role of visualization in providing multiple levels of abstraction. Students will explore the development of a novel user interface technology allowing users to visualize plans of ~500 activities for a single day's operation of multiple levels of abstraction. The goal is to create visual interfaces that help users track performed activities, identify constraint violations, visualize contingencies and suggest new plans.

  • (Dr. Vetria Byrd) Has new projects in: [1] Visualizing data from wearable devices. [2] Integrating Heterogeneous Data & Visualizing Heterogeneous Data (These two are big enough to be two separate projects so I have listed them as two, but they are related) [3] Utilizing Visual Analytics in Lupus Research for Children With Lupus [4] Creating open visualization tools [5] Mining Indiana Department of Education Data.

  • (Dr. Andreas Jung) Research projects for UG students are in the context of the analysis of the data recorded with the Compact Muon Solenoid (CMS) detector at the LHC with the goal of measuring the top quark coupling to the Higgs Boson. This coupling strongly influences the answer to the question whether the Universe is in a stable, meta-stable or unstable vacuum state. Together with the precision measurement of top quark properties, results can pin point beyond the standard model contributions, which is what all particle physicists are searching for since decades. The data analysis heavily relies on methods that include topological multi-variate analysis, regularized matrix unfolding and complementary profiling techniques. There is also the opportunity to contribute in the data analysis of silicon detector prototypes that are tested for their thermal performance. These thermal tests are carried out under the same conditions as the real CMS detector and use the existing capabilities of the Purdue Silicon Detector Laboratory in the Physics building.

  • (Drs. James Krogmeier and Darcy Bullock) We have access to high resolution traffic data regarding anonymized trips of individual vehicles on Indiana state highways. Coupled with traffic signal controller state information, road geometry, weather, road construction, and incident data we would like to explore various ways to visualize and analyze the data.

  • (Dr. Yung-Hsiang Lu) Have you noticed the cameras at traffic intersections? Would you like to see the data from these traffic cameras? What information can you obtain from the traffic cameras? Imagine that you can see 10,000 traffic cameras from New York, Chicago, Boston, London, Paris, Atlanta, etc. What can you do with the data? Can you develop computer programs to count the number of vehicles passing through these intersections? Can you count the number of people crossing the streets? Can you discover the latest fashion trend based on the people's clothing? Can your programs accurately count in different weather, day and night? Would your programs be able to discover different driving habits in different cities? A research group at Purdue University has created the world's largest camera network, called Continuous Analysis of Many CAMeras (CAM2, https://www.cam2project.net/). CAM2 is capable of analyzing vast amounts of real-time data from network cameras worldwide. CAM2 provides versatile data that can be challenging to any existing machine learning solutions. If you want to develop computer programs that can understand the world through thousands of network cameras, you would find this project challenging and exciting.

This material is based upon work supported by the National Science Foundation under Grant No. 1246818. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.