Non-parametric statistics

Get ready for 2024 and a brand new episode! We discuss non-parametric statistics in data analysis and AI modeling. Learn more about applications in user research methods, as well as the importance of key assumptions in statistics and data modeling that must not be overlooked.

Show notes

Welcome to 2024  (0:03)

  • AI, privacy, and marketing in the tech industry
  • OpenAI's GPT store launch. (The Verge)
  • Google's changes to third-party cookies. (Gizmodo)

Non-parametric statistics and its applications (6:49)

  • A solution for modeling in environments where data knowledge is limited.
  • Contrast non-parametric statistics with parametric statistics, plus their respective strengths and weaknesses.

Assumptions in statistics and data modeling (9:48)

Statistical distributions and their importance in data analysis (15:08)

  • Discuss the importance of subject matter experts in evaluating data distributions, as assumptions about data shape can lead to missed power and incorrect modeling.
  • Examples of different distributions used in various situations, such as Poisson for wait times and counts, and discrete distributions like uniform and Gaussian normal for continuous events.
  • Consider the complexity of selecting the appropriate distribution for statistical analysis; understand the specific distribution and its properties.

Non-parametric statistics and its applications in data analysis (19:31)

  • Non-parametric statistics are more robust to outliers and can generalize across different datasets without requiring domain expertise or data massaging.
  • Methods rely on rank ordering and have less statistical power compared to parametric methods, but are more flexible and can handle complex data sets better.
  • Discussion about the usefulness and limitations, which require more data to detect meaningful changes compared to parametric tests.

Non-parametric tests for comparing data sets (24:15)

  • Non-parametric tests, including the K-S test and chi-square test, which can compare two sets of data without assuming a specific distribution.
  • Can also be used for machine learning, classification, and regression tasks, even when the underlying data distribution is unknown.
  • Normalize data before conducting hypothesis tests.
  • Feature engineering and scaling before using methods like K-nearest neighbors.

Non-parametric testing in AI modeling (30:37)

  • Understanding nonparametric tests and their applications in modeling.

Do you have a question or a discussion topic for the AI Fundamentalists? Let's connect.

  • LinkedIn - Episode summaries, shares of cited articles, and more.
  • YouTube - Was it something that we said? Good. Share your favorite quotes.
  • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.