Paper review: “Tiered Categorization of a Diverse Panel of HIV-1 Env Pseudoviruses for Assessment of Neutralizing Antibodies”

Key authors:

  • Division of Viral Pathogenesis, Beth Israel Deaconess Medical Center, Harvard Medical School (Boston, MA): Michael S. Seaman
  • Statistical Center for HIV/AIDS Research and Prevention (Seattle, WA): Holly Janes, Natalie Hawkins, Linda Harris, Blake Wood, Steve Self, Peter B. Gilbert.
  • Theoretical Biology and Biophysics, Los Alamos National Laboratory (Los Alamos, NM): Marcus G. Daniels, Tanmoy Bhattacharya, Alan Lapedes, Bette T. Korber
  • (Walter Reed Army Institute of Research (Rockville, MD): Victoria R. Polonis, Francine E. McCutchan
  • Department of Surgery, Duke University Medical Center (Durham, NH): David C. Montefiori
  • (National Institutes of Allergy and Infectious Disease, National Institutes of Health (Bethesda, MD): John R. Mascola
  1. Problem to address: restricted neutralization breadth of vaccine-induced neutralizing antibodies is a limitation to HIV-1 vaccine development.
  2. Questions to answer: can we develop a way to quantitatively assess the neutralization sensitivity of circulating HIV-1 strains against a genetically diverse panel of HIV-1(+) patient plasma samples? Can we create a reference panel of HIV-1 for different subtypes (B, C, CRF02_AE, etc.) that are both geographically diverse and represent neutralization phenotypes representative of “primary isolate strains that a vaccine would need to protect against”?
  3. Experimental rationale: assemble a panel of 109 molecularly cloned HIV-1 clones (partially replication-incompetent) representing a “broad range of genetic and geographic diversity” (i.e. samples from acute and chronic infection, and from all major subtypes of HIV-1). Test the panel of Env pseudoviruses against a panel of genetically diverse HIV-1-positive (HIV-1+) plasma pools (mixtures of plasma from the same geographic regions).
  4. Methods of measurement (primary): measuring pseudovirus-containing supernatant into 50% tissue culture infectious dose (TCID_50) quantities by infection and visualization via luciferase assay. After 200 TCID_50’s of pseudovirus added to 3-fold serial dilutions of plasma pools, 1×10^4 cells added, supernatant removed and 100ul luciferase reagent added.  Measuring 50% inhibitory dose titer (ID_50) by measuring luciferase activity in cells post-single-round of HIV-1 infection by Env pseudoviruses, via a Victor 3 luminometer. ID_50 is defined as the serum dilution that causes a 50% reduction in relative luminescence units (RLU) compared to the level in the virus control wells after subtraction of cell control RLU.
  5. Statistical rationale: Rank ordering of average log_10 ID50 titers across 7 plasma pools. K-means clustering to determine the optimal number of clusters that achieved a threshold (80%) level of stability in bootstrap analysis. Impact of viral characteristics on neutralization sensitivity investigated with a linear generalized estimating equation (GEE) model, via a binary indicator variable of virus/sera clade match. Some random statistical test to validate approach of pooling sera by geographic region.

Update: will talk to Holly Janes tomorrow (1/25/13) to discuss why the paper wasn’t “the best”.

Continued next time (or in the comments): results, future research of viral neutralization sensitivity, current state of affairs.



Posted in Uncategorized | Tagged , , | Leave a comment

Greg re: time series data

Today I stopped by Greg’s office and we talked a bit about this time series course he’s taking currently (winter ’12). I spoke to him about probability and weighting information in poker, and he mentioned an exponential weighting scheme that seemed pretty cool (I wonder how to modify it to “fit” certain types of data – how to “weight” the “weights” between observations?)

Okay, so the setup is:

  • weight w in [0,1], 
  • observations X_t-1, X_t-2, …, X_t-h in some filtration F_t-1,

predict X_t using a weighting scheme of the form f(w*X_t-1, w^2*X_t-2, … , w^h*X_t-h)

I mentioned to him that I wanted to decide on an information weighting scheme that predicts X_t (whether or not a person would voluntarily put money in the pot:=VP$IP%) on the basis of F_t-1) but we talked about the possibility of correlations between related variables (Y_t-1, etc), of which the correlation structure could give you information about other variables.

I think correlation structure and time series modeling are pretty powerful concepts that could help with a way to represent objects/observations like this.

We were wrestling with the idea that within the “game of poker”, there’s the (1) individual hand in a session vs. player(s) that has (stack sizes, #players, positions, game history), (2) individual session at a casino, and (3) a long-term set of sessions. How to efficiently store a representation of these is still somewhat of a mystery. This is a programming, statistics, and database management issue that would be pretty interesting to solve.

In somewhat related news, I’m really trying to sink my teeth into PQL, especially to learn about PLO. Maybe there are some good beginning user guides that I can cut my teeth on before I can get some of my own ideas flowing.

Posted in Poker | Tagged , | Leave a comment

Meta-post: One organizational scheme to rule them all

So I’ve always had multiple journals running on different but overlapping subjects, including statistics, immunology and infectious disease, probability, poker (both session logs and thoughts), personal journal about life events, and those related to fitness and making money, as well as entrepreneurial ideas that I’d like to cultivate.

So far, they’ve been managed with disparate journals (on the computer: random .txt/.docx logs for poker, life, livejournal on poker and random life events, this wordpress blog, and random emails sent back and forth to yourself). I’d like to unify this process, but I have a few concerns:

  • Privacy: my thoughts aren’t that important or ground-breaking, but they reveal a degree of personal insight that I wouldn’t want everyone to know. I’ve struggled with this idea between open disclosure being conducive to an environment for discussion and improvement, and it being a way to let unsavory/malicious people know your inner thoughts. Also, with work, there’s the potential for being scooped (though most of the projects that I’m working on either aren’t super high-profile or groundbreaking, or I haven’t gotten that much involvement to the point where I need to worry about what I post.
  • Organization: this whole concept of categories/tags will help with keeping related categories together. A few questions that come to mind are: (1) what’s the difference (besides hierarchical categorization) between tags and categories? (2) how do you filter based on category/tag?
  • Coalescence: how to unify multiple logs? I have: (1) email transcripts of poker and journal entries, (2) paper log of journal entries, (3) gmail calendar of events and appointments, and (4) a rememberthemilk account of appointments, loosely linked to my phone and gmail calendar.

So what are some of the things I want to keep track of with a wordpress blog? The main topics that come to mind are:

  1. Statistics and probability – keep track of the concepts that I’ve been introduced to, techniques that I find useful, books I’ve read on statistics and probability, videos I’ve watched related to math and biology (MIT opencourseware), etc.
  2. Poker – session logs, probability exploration using the web-based tools, discussion of poker concepts from personal correspondence and via online message boards, foray into pot limit omaha (and the trials and tribulations of doing so). I want to highlight the conscious thinking that goes behind each session, as well as to continually emphasize the fact that it’s a learning process and I’ve got to keep my mouth shut and my ears wide open.
  3. Fitness – session logs, workout plans, food preparation, sleep schedule, etc.. I’d like to set some clearly defined goals and detail my progress in approaching them.
  4. Professional development – concepts related to exploring career options and money making opportunities, with an eye for long-term sustainability and enjoyment. Particularly, mentions of hot/up-and-coming enterprises are worth mentioning, as sometimes they slip in one ear and out the other.
  5. Graduate school – detailing the effort behind pursuing a post-secondary degree and the fun that comes along with it. Hopefully having a blog will help with the memory consolidation aspect.
  6. Science – reading the vast amount of papers in mathematics, statistics, immunology, virology, vaccinology, and general biology may benefit from careful planning and writing summaries. Plus, I want to get a sense of the history/continuity of developing ideas in these fields (not that I’m anywhere close to the “cutting edge” at this point!)
  7. Pro tips – anything related to living smart(ly). Could include travel recommendations, cool services, etc.

So the gameplan at this point is:

  1. Define some goals in the next post about what I’d like to accomplish in these arenas and ways to keep track of my progress
  2. Create categories for each of these sections and work out the tag/category question
  3. Merge your journals with each other (while keeping a majority of the content private)
  4. Try out publishing “errything” for a while and see what works/what doesn’t work
  5. Consider separating topics/blogs into work-related/non-work-related



Posted in Uncategorized | Leave a comment

New software: Pandoc, RememberTheMilk, HyPhy, oh my!

Learning how to use software is a gradual process; in my experience progress occurs nonlinearly (advances tend to come in spurts). Is it worth the potential efficiency gain to learn something new? Well, that’s a decision that I can only come to a conclusion about after I’ve tried…

A few new things to learn:

  1. Pandoc ( – a universal document converter. This way, documents can be output to (PDF, LaTeX, HTML, .docx, etc) with minimal effort. Handles citations very well. Requires GHC, a Haskell compiler.
  2. Haskell ( – a functional language with a compiler too! Stupid Q: why is a functional language paired with a compiler?
  3. LearnYouAHaskell ( – a pictorial resource for learning Haskell – if anything, it’s a really good example of a well-put together tutorial with cool pictures to increase interest in the topic 🙂
  4. RememberTheMilk ( – great resource for managing todo lists…
  5. HyPhy ( – software package for analyzing genetic sequence data. Supposedly, we can use it to reconstruct a HIV-specific PAM-within matrix to estimate evolutionary distance between two sequences within a subject.
Posted in Uncategorized | 2 Comments

You’ll never know until you ask

Met some influential people today and had some notable conversations – in their order:

  1. talked with Peter about the longitudinal sequence data re: explaining the project (small sample size, how we’re treating the 454 data, what metrics we’re using to evaluate sequence evolution over time), got into a brief discussion about PAM matrices (see Nickle 2007 for discussion of the origin of the PAM-among matrix) and how using a within-subject HIV-specific PAM matrix may be the best way to measure viral evolution in a patient over time. We briefly discussed the difficulty of identifying the sample numbers that correspond to the sequences – the sampling schedule changes post HIV+ diagnosis (Dx+) and the procedure for diagnosing HIV-1 infection involves a look-back procedure once the (cheaper) p24 ELISA assay comes back positive.. Overall he seemed less angry than I thought he would be given that it took me a week to frantically piece together the notes I’d been writing while poring over the data. I asked him for some guidance in the analysis and he seemed receptive to the idea –  I think if I move the progress along and go through the required steps (data visualization via descriptive plots, then work towards inference) he will be more willing to offer support. We ended with his recommendation that I split the plots apart by subject and use some form of evolutionary distance measure (say PAM or BLOSUM) on the Y axis, plotting each position with changes as single points. One could consider both the distance from the first sample, and the distance from the insert over time. This could be interchanged along with the distance metric. I told him I’d have a quick turnaround time, but he just told me to show him the plots when I got them. This is something that I’ll hope to forge through today (even though it’s 4:45pm and I just got back to my office!)
  2. Talked to Greg, a 3rd year stats grad student who was in the shared RA office – he works for Raphael on ICS data, and he was very friendly and receptive to my questions on the relative use of measure theory, linear algebra, parameter estimation using frequentist or bayesian methods, what bayesian inference “looks like” when applied, how MCMC works to numerically estimate an invariant distribution, interesting peculiarities to MC’s (like multi-modality) that can be diagnosed with trace plots… Overall, he spoke pretty positively about his experience and genuinely seemed surprised at what he learned over the years… Sometimes it was hard for me to convey that I understood some things more readily than other concepts – it just goes into communicating well with the person who’s trying to teach something. It was cool to hear him explain the concepts! I hope to be able to do the same (soon enough).
  3. Stopped by Paul’s office before I left – he was very busy working on some mathematical proof he was sending to his advisor, Art. I took a lot of his time explaining the situation with Peter and what he proposed our approach should be. He cautioned that it’s good that Peter get involved, but to not backtrack too much and his proposed ideas aren’t a “scaling back” but actually a “moving forward” by implementing a distance metric vs. just looking at the “counts” of changes between sequences. He also told me to send him my material for graduate school applications – it was then that I told him about the fact I thought I wrote a poor personal statement and that I hadn’t gotten my stuff together with the recommendation process… In the end, he seemed happy to look at my material (whatever I had) and I made sure to send it to him before I left for UW campus. Still hashing it all out – this will come!
  4. Met Hasan in the lunchroom and asked him about distance metrics – we had a brief discussion about PAM matrices, and we talked briefly about Master’s degrees in biostatistics vs. bioinformatics (he says the cross-listing of courses at his school, Emory, wasn’t there between the two departments). While they’re both useful, his impression of bioinformatics was that it dealt with the computational approaches to manipulating the data, rather than formulating the statistical questions. Something to think about – I want to have a solid mastery of the implementations as well as the theoretical foundation of statistics to develop new approaches, but I’ve only got one lifetime! How to maximize impact…
  5. The seminal discussion came when I visited Prof. Bookstein in his office in the B-wing of Padelford. It’s an awfully old building built over the edge of a steep hill and just has a really “solid” old feel to it. I spoke with him about my path towards statistics and he seemed very interested in hearing about the applications of “decisions under uncertainty” and “systems under indirect observation”. Some notes I took:
  • Recommended Nasiim Taleb’s books, “Fooled by Randomness”, “Antifragile”, which deal with the properties of “long-tailed distributions”.
  • Artificial Intelligence as a topic of interest – requires introduction to probability (taught by Michael Perlman) and stochastic processes (by V. Minin)
  • Data-Driven Discovery by Ed Lazowska & Werner Stuetzle – there’s demand for faculty with “machine learning” as a topic of interest.
  • Q: Which department would you want to be in? CS/ is there a department of machine intelligence? If you had the opportunity, would you do an inter-departmental degree bridging the gap between two disciplines?
  • “Systems under indirect observation” are similar to poker/medical diagnoses. Working with limited time/resources, attempt to discern the causal variables whose identity would determine the cause with the greatest certainty/least ambiguity.
  • Marina Meila is an AI-focused professor, though she tends to be conservative with her ideas (which is not necessarily a bad thing).
  • Emily Fox – don’t remember why he mentioned her, but it’s worth looking into.
  • Pattern analysis of real data – it’s going to be something we see much more often
  • Peter Ward RE: climate change, good application of statistics to change social perspectives/impact policy, hopefully?
  • Books to study:
  • Feller – Probability Theory and its Applications
  • Cramer – Mathematical Methods of Statistics
  • Wilks – Mathematical Statistics
  • Question: how do we go from a DATASET to a MULTIDIMENSIONAL distribution? Still an ‘unsolved’ problem
Posted in Uncategorized | Leave a comment

Update: Post-First Application Jitters


  • Turned in my UW biostat application in late, got an advisor to reopen the application so I could turn in my application fee and upload my CV (which I did not change to include the projects that I’ve worked on at FHCRC)…
  • Did a half-baked job on my personal statement, so it’s not formatted well and needs much work to convey the excitement about my work experience / potential educational experience, etc etc…. Get a head start on your next applications!
  • Still haven’t asked June Morita for a letter of recommendation… Do this TODAY!
  • Sent an email to Fred Bookstein RE: topics to focus on in graduate school, the idea of applying to graduate school, general strat… He’ll email back with a lunch date soon.
  • Talked with Richard RE: schools and the application process. Output: go big or wait until the next application process, don’t turn in an application that will cripple/burn your chances of applying down the road. Also stop to consider what your long-term career ambitions are, and make sure that your decision to apply for graduate studies is a well-considered decision.

Recap/things to consider:

  • Communicating with Paul has been very valuable, meeting over lunch really helped key you into various concepts and their value. This is something that should be repeated.
  • Work harder in your job to get the full benefit of your experience here. Not only has the organization given you many opportunities to get involved, it feels good to give back to the community that’s taken you “into the fold” as much as they have.
  • Consider the larger aspects of graduate studies / life plan when you have the opportunity.


Posted in Uncategorized | Leave a comment

Beginning the Journey

  Hi, my name is Michael and I’m an aspiring student of statistics and biochemistry with particular interest in medicine and infectious disease. My purpose for creating this blog is to reflect on my path through undergraduate education at the University of Washington in biochemistry and statistics, to my formal work experience at Fred Hutchinson Cancer Research Center where I’ve had a chance to apply my knowledge through writing programs to implement statistical methods for analyzing HIV vaccine trial data. Through writing in this blog I hope to discover the strengths and weaknesses of my knowledge base and further define and explore my interests in statistics and medicine. 

Posted in Uncategorized | Leave a comment