Neuroimaging the programmer’s brain: an EEG analysis of semantic and syntactic anomalies in the comprehension of computer languages.

July 27, 2008

Study proposal, draft #1

Background:

Event-related potential (ERP) studies of language processing have identified several interesting components, which have since been interpreted as the electrophysiological correlates of perceptual and cognitive operations (David, Harrison, and Friston, 2004).  The two ERPs that will constitute the focus of this experiment are the N400 and the P600, both of which occur in the centro-parietal region of the brain.  The N400 is a negative component, which is elicited as a response to the presentation of semantically incorrect words in a sentence.  It appears 300 – 500 milliseconds after the onset of the target word.  The P600 is a positive component that has been observed as a response to the presentation of syntactic anomalies in a sentence.  It appears 500 – 800 milliseconds after the onset of the target word.

The N400 and the P600 may fluctuate in onset and amplitude due to several documented factors, including contextual validity, type of anomaly, and the familiarity of the subject with the language of the presented stimuli.  Numerous studies by Kutas and Hillyard (1980, 1983, 1984) have shown that contextually incorrect words produce a negative component with a large amplitude, which peaks at approximately 400 ms after the onset of the stimulus.  A study conducted by Osterhout and Holcomb (1992) suggested that ERPs are sensitive to syntactic anomaly, and evaluated the hypothesis that the P600 and N400 effects are elicited as a function of anomaly type.  A study by Hagoort (2003) investigated the effects of combining syntactic and semantic violations, and revealed an asymmetry in the interplay between syntax and semantics during sentence comprehension.  A multitude of studies have researched on various aspects of second language acquisition.  A study by Hahne (2001) demonstrated that native speakers of Russian who had learned German after the age of 10 demonstrated weaker N400 and P600 effects, as compared to native German speakers.

Research Question:

The overarching goal of this research is to gain concrete insight into the cognitive methods and processes that are employed when individuals master a computer programming language and perform programming tasks.  Such insight would be beneficial in designing effective instruction for novice programmers, as well as advancing the understanding of expertise in this domain.  Past research on programming and programmers lacked a neuro-physiological element, and this study aims to fill that gap in the literature.

The purpose of this experiment is to determine whether N400 and P600 effects can be observed when expert computer programmers are presented with semantic and syntactic errors, therefore, indicating whether there is a similarity in the way that the brain processes natural languages and programming languages.

Hypothesis:

Programmers will demonstrate N400 and P600 effects when presented with stimuli consisting of semantic and syntactic errors (respectively) in the programming language for which expertise has been demonstrated and confirmed.   However, the components will display a  reduced amplitude and a longer peak latency, as compared to the components elicited when native English speakers are presented with grammaticality judgement tasks of English sentences.  Therefore,  the observed effects should be analogous to those seen in expert second language learners (Hahne, 2001).

Methods:

For the purpose of selecting a population for this experiment, experts will be defined as those programmers who have over 7 years of professional experience using the designated programming language.  Expertise will further be assessed through a series of written tests administered online.   The number of subjects will purposefully be kept small, so as to avoid the counterproductive pitfalls of grand averaging the results of functional brain imaging data  (Savoy).  At the writing of this proposal, the choices for the programming language to be used  in the experiment have been narrowed down to Perl and Java, due to the high-level nature of both languages, as well as the availability and proximity of expert subjects   The decision as to which language will ultimately be used will be finalized pending further research into the construction of  quality stimuli.

Stimuli and Conditions:

The stimuli will consist of standards (standard code phrases with no syntactic or semantic errors) and deviants (incorrect code phrases with either syntactic or semantic anomalies).  The standards will constitute 80% of the stimuli, while the remaining 20% will consist of the deviants.  The deviants will consist of 50% semantic violations, and 50% syntactic violations.  The N400 and the P600 require that the participants pay attention, therefore, they will be asked to provide feedback at the end of every stimulus – they will be instructed to press one button if they consider the code phrase to be correct, and another button if they consider it to be incorrect.

An example of a standard stimuli, using the Java programming language:

  String[] answer = new String[30];

An example of a deviant stimuli with a syntactic anomaly, using the Java programing language:

  String[] question = new String(10);

An example of a deviant stimuli with a semantic anomaly, using the Java programing language:

  if(newObj1 == newObj2){...}

Therefore, there will be three conditions for each montage.  In the N400 area, the conditions will be SM (semantic), GD (good), and SX (syntactic).  Similarly, for the P600 area, the conditions will also be SM, GD, and SX.

Considerations:

Working memory limitations are a serious consideration in the design of this experiment.  Because programming code will be presented in a fragmented way, the experiment will create a serious strain on each subject’s working memory. The use of eye tracking systems will alleviate this problem by allowing the entire code phrase to be presented at once.  However, at this time, eye tracking has not yet been implemented at the Columbia EEG laboratory, and the experiment will need to compensate by strictly limiting the length of code phrase stimuli.

Furthermore, that kind of presentation (one word at a time, or even, one code phrase at a time) does not replicate the natural mode by which real programmers look at source code.  While English language sentences can usually be easily interpreted and processed outside of the general context, the same does not typically apply to computer code.  Again, an eye tracking system would remedy this problem by allowing the stimuli to consist of many lines of code presented at once, thus creating the necessary context for the syntactic and semantic anomalies.  However, for the time being, a work-around can be achieved by allowing the programmers to familiarize themselves with the long program, prior to being exposed to the small phrase stimuli, thus creating the necessary context for individual pieces of code.

Analysis:

The analysis will focus on contrasting the response to the standard stimuli with the two types of deviant stimuli.  There is one single group (Control), and three conditions (semantic, syntax, and good).  The stimuli will be given by the condition.

For the N400, we predict a strong effect in SM:

– SM vs GD –> the mid time window will present a more negative wave in the deviant semantic condition than in the standard condition.

– SM vs SX –> the mid time window will present a more negative wave in the deviant semantic condition than in the deviant syntactic condition.

ANOVA : (3×3) – Time (early, mid, late) x Condition (SM, SX, GD)

T-test: SM vs GD, SM vs SX, GD vs SX

For the P600, we predict a strong effect in SX:

– SX vs GD –> the late time window will present a more positive wave in the deviant syntactic condition than in the standard condition.

– SX vs SM –> the late time window will present a more positive wave in the deviant syntactic condition than in the deviant semantic condition.

ANOVA : (3×3) – Time (early, mid, late) x Condition (SM, SX, GD)

T-test: SX vs GD, SX vs SM, GD vs SM

References:

Ainsworth-Darnell, K., Shulman, H.G., and Boland, J.E.  (1997) Dissociating Brain Responses to Syntactic and Semantic Anomalies: Evidence from Event-Related Potentials.  Journal of Memory and Language, 38, 112-130

Coles, M.G.H., and Rugg, M.D. (1995). Event-related brain potentials: An introduction. In M. Rugg, M. Coles (Eds.), Electrophysiology of Mind, Oxford University Press: Oxford, U.K.

David, O., Harrison, L., and Friston, K.J.  (2004)  Modelling event-related responses in the brain. NeuroImage.  25(3), 756-70.

Federmeier, KD, & Kutas, M. (1999). A rose by any other name: Long-term memory structure and sentence processing. Journal of Memory & Language, 41(4), 469-495

Hagoort, P. (2003). Interplay between Sntax and Semantics during Sentence Comprehension: ERP Effects of Combining Syntactic and Semantic Violations.  Journal of Cognitive Neuroscience, 15:6, 883-899

Hahne, A. (2001). What’s different in second-language processing? Evidence from event-related brain potentials.  Journal of Psycholinguistic Research, 30, No. 3, 251-265

Kutas, M. & Hillyard, S.A. (1980) Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207, 203-205.

Kutas, M. and Hillyard, S.A. (1983) Event-related brain potentials to grammatical errors and semantic anomalies.  Memory and Cognition, 11,  539-550.

Kutas, M. and Hillyard, S.A. (1984) Brain potentials during reading reflect word expectancy and semantic association.  Nature, 307, 161-163.

Letovsky, S., Pinto, J., Lampert, R, and Soloway, E. (1987) Comprehension Strategies in Programming.  Empirical Studies of Programmers: Second Workshop. Ablex Publishing Corporation.  Norwood, New Jersey.  231-247.

Osterhout, L., and Holcomb, P.J. (1992) Event-Related Brain Potentials Elicited by Syntactic Anomaly.  Journal of Memory and Language, 31, 785-806.

Pennington, N. (1987) Comprehension Strategies in Programming.  Empirical Studies of Programmers: Second Workshop. Ablex Publishing Corporation.  Norwood, New Jersey.  100-113.

Advertisements