DA 350 Labs Tackle Real Life Data Problems

The mathematical methods used in Data Analytics are constantly evolving.  Traditional methods learned in introductory statistics courses such as hypothesis testing and linear regression were designed before the era of computers and could be calculated by hand on small data sets.  With the rise of massive computational power, new methods are constantly being developed that give better predictions and lead to better decisions across many domains.  In DA 350, Advanced Methods for Data Analytics, students explore some of these modern methods and use them to solve a wide variety of practical problems with real life data sets.  Some of the lab projects students have worked on:

    • Teaching a computer to recognize and sort Egyptian Hieroglyphics. Students were given 4,000 images of hieroglyphics from an ancient Egyptian book and wrote programs to sort those images into different categories: all the ankhs were together, all the birds together, etc.  The catch? Students didn’t know how many categories there even were!  These sorts of tools can be useful for researchers wanting to automatically decipher the meaning of a page, and similar ideas let the Post Office automatically sort mail by reading handwritten zipcodes.DA 350 labs - egyptiantexts3-150x150.jpg image #0
    • Creating a system to recommend new movies to viewers. Our favorite websites that give us books to read, movies to watch, music to listen to, and products to shop for often give us recommendations of new content we may enjoy.  Students used data from over 20 million movie reviews to suggest new movies for viewers based on the reviews they had given to other movies.
    • Detecting fraud in credit card transactions. Everyday, millions of credit card transactions occur across the world, and banks are extremely good at automatically detecting and halting fraudulent use.  Students built systems to classify transactions as valid or fraudulent, carefully considering the tradeoffs involved.  A challenge with this is the relative proportions: Less than one hundredth of one percent of transactions are fraudulent, which can pose a challenge to learn from the data.
    • Predicting winners of the March Madness Basketball tournament, using historical data on each team’s performance in areas such as points per game, rebounds, and turnovers.
    • Online advertisement placement: which ad is a user most likely to click on, and how should it be placed on the page? This is a classic example of an “exploration versus exploitation” problem, in which you need to decide when to explore new possibilities and when to stick with something you know.  This type of problem is highly relevant in clinical trials of new drugs for disease: as we start to gather results about the disease progression of patients on an experimental drug versus a control drug, when are we convinced we have sufficient evidence that the new drug is better?
    • Allocating the budget of a soup kitchen to feed healthy meals to the most people.  Combining nutritional data on foods with pricing data, students use optimization models to decide how to spend the budget.  Careful consideration is given to the humanistic factors involved, such as not offering a repetitive diet, and being sensitive to food allergies.

Students leave DA 350 equipped with the skills and knowledge they need to use advanced data analytics methods to solve real world data problems.