#Analysis of Wadge et al., Cortex 2019

This notebook demonstrates the analysis of communicative behavior produced during experimentally controlled interactions between autistic and neurotypical participants.

Participants were assigned pairwise to either the ASD group (7 pairs, each containing two individuals with ASD, the Typical group (11 pairs, each containing two individuals with no clinical diagnosis), or the Mixed group (8 pairs, each including one individual with ASD and one individual with no clinical diagnosis).

To get started, let's clone the course github, which has a directory **data** containing our experimental files

In [None]:
!git clone https://github.com/StolkArjen/interacting-minds.git
# you should be seeing a folder named 'interacting-minds' appearing in your workspace (folder icon on the left)

# just FYI, to remove the folder, use in separate code block: !rm -rf interacting-minds
# to clear all outputs, go to Edit > Clear all outputs, followed by Runtime > Restart

Our data are located in interacting-minds/data/WadgeCortex19. But what files are in this directory? Let's create an inventory

In [None]:
import os, sys
from glob import glob

data_dir = os.path.join(os.getcwd(), 'interacting-minds', 'data', 'WadgeCortex19')
files = glob(os.path.join(data_dir, '*'))

files # show

What information is in these files? Let's read one of them using **pandas** functionality

In [None]:
import pandas as pd

df = pd.read_csv(os.path.join(data_dir, 'A.csv'))

df # show

Let's plot some data. For instance, the number of moves subject 1 and 2 made while playing the game. Let's also plot their averages

In [None]:
import matplotlib.pyplot as plt

# number of moves
plt.figure()
plt.plot(df['S1_NMoves'])
plt.plot(df['S2_NMoves'])
plt.xlabel('Interactions')
plt.ylabel('Number of moves')
plt.legend(['S1','S2'])

# average number of moves
S1_NMoves_mean = df['S1_NMoves'].mean()
S2_NMoves_mean = df['S2_NMoves'].mean()
plt.figure()
plt.bar(['S1','S2'], [S1_NMoves_mean, S2_NMoves_mean])
plt.ylabel('Number of moves')

# average number of moves over odd trials
S1_NMoves_mean = df['S1_NMoves'][0::2].mean()
S2_NMoves_mean = df['S2_NMoves'][0::2].mean()
plt.figure()
plt.bar(['S1','S2'], [S1_NMoves_mean, S2_NMoves_mean])
plt.ylabel('Number of moves - odd trials')

# average number of moves over even trials
S1_NMoves_mean = df['S1_NMoves'][1::2].mean()
S2_NMoves_mean = df['S2_NMoves'][1::2].mean()
plt.figure()
plt.bar(['S1','S2'], [S1_NMoves_mean, S2_NMoves_mean])
plt.ylabel('Number of moves - even trials')

What do you notice? The number of moves made by subject 1 and 2 seems to globally covary over the experiment. But there's also fine-grained structure where subject 1 makes more moves than subject 2 during odd trials, with the reverse being true for even trials. What could explain these patterns?

Let's continue and read the data from all pairs while calculating their overall joint communicative success

In [None]:
import os

files = sorted(glob(os.path.join(data_dir, '*.csv')))
score = {}
for l in files:

  # pair success
  tmp = xxxxx # import the file as a (comma-separated) pandas dataframe
  success = sum(tmp['Accuracy'])/80 # out of 80 trials total

  # store in a dictionary
  pair = os.path.split(l)[-1][0]
  score[pair] = success

print(score) # show

Uh oh... can you fix it? I think that line was reading in the pair's csv file...

Now let's split and plot the data according to pair type

In [None]:
import numpy as np

# pair types
ASD = ['A','B','C','D','E','F','L'] # autistic pairs
Typ = ['G','H','R','S','T','U','V','W','X','Y','Z'] # neurotypical pairs
Mix = ['I','J','K','M','N','O','P','Q'] # mixed pairs

# success per pair type
ASD_success = [score[k] for k in ASD]
Typ_success = [score[k] for k in Typ]
Mix_success = [score[k] for k in Mix]

# summary statistics
ASD_success_mean = np.mean(ASD_success)
Typ_success_mean = np.mean(Typ_success)
Mix_success_mean = np.mean(Mix_success)
ASD_success_std = np.std(ASD_success)
Typ_success_std = np.std(Typ_success)
Mix_success_std = np.std(Mix_success)

# bar charts with error bars
plt.figure()
plt.bar(['ASD','Mix','Typ'], [ASD_success_mean, Mix_success_mean, Typ_success_mean], yerr=[ASD_success_std, Mix_success_std, Typ_success_std])
plt.ylabel('Joint success (%)')
plt.savefig('success.pdf')

What do you observe? It seems there is a lot of variability, especially in the ASD and Mixed pairs. Let's see if some of that variation can be explained by pairs' IQ, which is stored in the subject summary excel sheet

In [None]:
# read in the subjects information sheet
x = pd.read_excel(os.path.join(data_dir, 'Subject_Summary.xlsx'))

x # show

Let's extract and store pairs' mean and mininum IQ in dictionaries

In [None]:
IQ_mean = dict(zip(x['Pair_name'], x[['S1IQ','S2IQ']].mean(axis=1)))
IQ_min = dict(zip(x['Pair_name'], x[['S1IQ','S2IQ']].min(axis=1)))

# sort alphabetically
IQ_mean = dict(sorted(IQ_mean.items()))
IQ_min = dict(sorted(IQ_min.items()))

plt.figure()
plt.scatter([IQ_mean[k] for k in IQ_mean], [score[k] for k in IQ_mean])
plt.scatter([IQ_min[k] for k in IQ_min], [score[k] for k in IQ_min])
plt.xlabel('IQ')
plt.ylabel('Joint success (%)')

Which one is the better fit? Let's account for variance explained by IQ, and see whether the effects of reduced communicative success in pairs containing autistic individuals persist

In [None]:
import statsmodels.api as sm
from scipy import stats

# linear regression test of the effect of mean IQ on score
y = [score[k] for k in IQ_mean]
X = stats.zscore([IQ_mean[k] for k in IQ_mean])
X = sm.add_constant(X) # adding a constant to get an intercept

lr_mean = sm.OLS(y, X).fit()
lr_mean.summary()

In [None]:
# linear regression test of the effect of minimum IQ on score
y = [score[k] for k in IQ_min]
X = stats.zscore([IQ_min[k] for k in IQ_min])
X = sm.add_constant(X) # adding a constant to get an intercept

lr_min = sm.OLS(y, X).fit()
lr_min.summary()

Minimum IQ appears to have a statistically significant influence on joint communicative success, even more so than mean IQ. Which parameters in the above tables support this conclusion?  

Why do we want to account for it, when there are no group differences in IQ (based on stats of the article)? The following code "regresses out" contributions from IQ to communicative success

In [None]:
# predicted contribution of minimum IQ to success rates
X[:,0] = 0 # zero out constant to estimate the effect of IQ alone
y_pred = lr_min.predict(X)

# residuals after accounting for that contribution
res = (y - y_pred)

print(np.c_[res, y]) # show IQ adjusted scores alongside original scores

Look, for instance, at the top pair (pair A). This pair had high performance but apparently also high IQ, because their performance drops substantially after correcting for IQ

In [None]:
import string

# put back in dictionary format and plot as before
score_adj = {}
keys = list(string.ascii_uppercase)
for count, key in enumerate(keys):
  score_adj[key] = res[count] # key-value pair

# pair types
ASD = ['A','B','C','D','E','F','L'] # autistic pairs
Typ = ['G','H','R','S','T','U','V','W','X','Y','Z'] # neurotypical pairs
Mix = ['I','J','K','M','N','O','P','Q'] # mixed pairs

# success per pair type
ASD_success = [score_adj[k] for k in ASD]
Typ_success = [score_adj[k] for k in Typ]
Mix_success = [score_adj[k] for k in Mix]

# summary statistics
ASD_success_mean = np.mean(ASD_success)
Typ_success_mean = np.mean(Typ_success)
Mix_success_mean = np.mean(Mix_success)
ASD_success_std = np.std(ASD_success)
Typ_success_std = np.std(Typ_success)
Mix_success_std = np.std(Mix_success)

# bar charts with error bars
plt.figure()
xxxx
xxx
plt.savefig('success_adj.pdf')

Recognize this plot?

Let's perform a variance analysis (ANOVA) on these values

In [None]:
# One-way ANOVA
fvalue, pvalue = stats.f_oneway(ASD_success, Mix_success, Typ_success)
print(fvalue, pvalue) # indicating a statistically significant effect of group

Qualify the effects further using post-hoc comparisons

In [None]:
!pip install scikit_posthocs

In [None]:
import scikit_posthocs as sp

df = pd.DataFrame({'score': ASD_success + Mix_success + Typ_success,
                   'group': np.repeat(['ASD', 'Mix', 'Typ'], repeats=[len(ASD), len(Mix), len(Typ)])})
print(df)

sp.posthoc_ttest(df, val_col='score', group_col='group', p_adjust='fdr_tsbky', pool_sd=False)