Wednesday, November 11, 2015

The Human Factor: How Do We Gauge Rider Preferences?

So I feel like a completely hopeless case of bike geekery after having spent an entire evening "working" on this! But after I posted about the Bella Ciao two-fork project, a friend from my former academic days directed my attention to the bit where I wrote the following:
...there is no question of this project being an experiment in the scientific sense - if only because the tester (myself, that is) was aware from the start which version was which, and had pre-conceived notions of what to expect from each. But I think that, at least in theory, it would be possible to arrange a blind test of this bike with the two forks. ...The visual difference is quite subtle - so that [potential testers] might very well not know which version they'd be trying, even after switching from one to the other.
"Oh come on, you can't just leave it at that! Why not describe how you'd go about doing that blind test?"

"Like, if someone were to do an actual experiment?"


"Hmm... Because it would be a colossal waste of time?"

"Yes, I suppose it would. But what fun!"

With a drink in hand and a long stormy night ahead, it began to look that way.

So how would we gauge rider preference in bicycle handling, and why would we even care? Well, to backtrack a tad, here is the thing. Most of what we know about bicycle handling is derived from the field of Physics. That seems only right and proper - we've got stuff like energy, momentum, rolling resistance, centrifugal forces to deal with - clearly all in the Physics domain. Now, Physics is a hard Science, chalk full of laws and numbers and formulas and stuff. Therefore, we expect it to yield concrete, unambiguous answers to our bicycle handling questions - which it does, and which are the answers that much of bicycling literature presents as fact. And don't get me wrong: It is fact. Nevertheless it does not tell the whole story. Because physics alone fails to address something quite crucial: the human factor - that is, the rider's perception of handling, which in practice, is really the final product that interests us. "How will this bicycle feel for us to ride?" Is the question we are really asking when we ask about its handling. And that question cannot be answered by physics alone.

Take as an example the "low trail" question. It is fair to say that there is disagreement among riders as to what effect, or cluster of effects, this aspect of front-end geometry has on handling. Anecdotally, some riders report that they find it conducive for riding no-hands, others feel it makes riding no hands impossible. Some find it easier to corner at high speeds, others find it more difficult. Even in a broader sense, some find the general sensation of low trail handling intuitive whereas others describe it as disconcerting. Of course the problem is that all of this feedback is anecdotal. Furthermore, it tends to come from riders who are passionate about the topic, and have pre-conceived notions about bicycle handling or design which might bias their impressions (present company included!).

Like it or not, we are in the realm of subjective human sensations here. So if we want to learn the real answer to a question such as "How do riders tend to respond to a bicycle's trail properties?" we need to design a proper experiment, in a controlled environment, and in a way that cuts through these potential biases. And for that it is useful to turn to the field of experimental psychology.

As it happens, this is a field I've dedicated a good few years of my life to in the past. If I were designing a study to gauge trail preferences based on the two-forked bike sent to me by Bella Ciao, it would look something like this (complete with pained attempt at catchy title):


Happy Trails: Gauging Rider Preference in Front-End Geometry on a Classic Upright Bicycle

The purpose of this study is to determine how cyclists will respond to a bicycle with standard vs low trail geometry.

Several groups of participants will be asked to ride bicycles over a pre-determined course, then provide feedback about their experience.

Participant Recruitment
To attain statistical significance in data analysis, a minimum of 48 adult participants should take part in this study. The target participant is an experienced transportation cyclist within the height range appropriate for the sample bicycle used in the study, and with limited knowledge of (or interest in) bicycle geometry.

Initial participant recruitment may take place at bicycling events, popular bike parking facilities, etc., where such persons are likely to be found. Participants may be offered compensation for their time. In the first instance, potential participants will be asked two basic qualifying questions: whether they’d be interested in taking part in a bicycling-related study, and what is the nature of their cycling experience. Those who answer “yes” to question 1 and report a minimum of 2 years regular bicycle commuting experience will be deemed eligible for the study and invited for further screening.

Potential participants will then be given a number of screening questions to ensure that their knowledge of bicycle geometry is appropriately limited. Participants who "pass" (i.e fail) the geometry test will be invited to take part in the experimental trials.

Participants will take part in the study one at a time. Upon arrival, each participant will be told that they will take part in two back-to-back trials, where they will be asked to ride two very similar bicycles over an identical course, then report how each bicycle feels. The exact nature of the difference between the two bicycles they will try will not be explained to them.

Trial 1:
Each participant will be given a bicycle and asked to ride it along a pre-determined loop over varied terrain. Upon completion of the course, the participant will be handed a questionnaire designed to solicit feedback about various aspects of ride quality and handling, as well as to gauge their attitude toward the bike (i.e. how much they like it).

Trial 2:
After a short break, the participant will be given an identical-looking bicycle. They will be told it is very similar to the first bicycle they tried, and that they might notice some differences, or they might not. The procedure described in Trial 1 will then be repeated.

To control for potential placebo and priming effects, participants will be assigned at random to one of 4 groups, with 12 participants in each:
  Experimental Group 1
  Experimental Group 2
  Control Group 1
  Control Group 2

The bicycles which participants receive in trials 1 & 2 will differ according to the group they are assigned to, in the following manner:

Experimental Group 1:
Participants in this group will ride the standard trail bicycle in Trial 1, followed by the low trail bicycle in Trial 2.

Experimental Group 2:
Participants in this group will ride the low trail bicycle in Trial 1, followed by the standard trail bicycle in Trial 2.

Control Group 1:
Participants in this group will ride the standard trail bicycle in both Trial 1 and Trial 2 (while still believing they are trying two different bikes).

Control Group 2:
Participants in this group will ride the low trail bicycle in both Trial 1and Trial 2 (while still believing they are trying two different bikes).

The purpose of the two experimental groups is to control for potential priming effects, whereby the order in which the bicycles are tried might colour rider experience. The purpose of the control groups is to control for potential placebo effects, whereby the mere awareness that they "should" sense a difference between the two bikes might colour rider experience.

Bicycle: A single bicycle will be used over the course of the study, fitted, as appropriate, with either the manufacturer's standard fork (resulting in mid trail), or a modified fork with longer rake (resulting in low trail). The bicycle's saddle and handlebar height will be fitted for each participant individually.

Course: The course over which participants will cycle should be approximately 3-4 miles long, and include aspects that make front-end geometry noticeable, such as tight turns, descents, bends.

Questionnaire: The questionnaire used to gauge riders' feedback should be designed by a team of social scientists with experience in conducting similar studies. A combination of free-form and Likert scale questions will likely be used.

After all 48 participants complete the study and questionnaire data is collected, statistical analyses will be conducted to compare riders' responses to the low trail vs standard bicycles, once other factors are controlled for. The analyses will also test for patterns based on rider characteristics such as gender, height, weight, age, and cycling experience.

Anecdotal evidence suggests that riders' perceptions of a bicycle's handling differ, and, consequently, so do their preferences for low vs mid trail bikes. I expect the results of this study to confirm this. Statistical analyses should show a split both in the types of characteristics participants attribute to one bike vs the other, and in participant preferences for one over the other.

It is not incompatible with these predictions that an overall pattern in rider perceptions and preferences may nonetheless emerge (i.e. that riders on the whole tend to prefer X trail, and report such and such characteristics for low vs mid trail handling). I personally have no predictions as to what shape this pattern will take. Either way, the results might prove informative for bicycle manufacturers, framebuilders, and others in the industry for whom the topic is of relevance.


Soooo dear readers! If you have managed to read this far, congratulations (and double points if you don't have a drink in hand)! And if you are wondering what the point to all of this is - as surely it is unrealistic to expect any interested party (Bicycling Quarterly comes to mind) to dedicate the time and financial resources necessary for a study of this type to actually be developed and conducted professionally - it is mainly this: In order for anyone to make claims about what any aspect of a bicycle's construction, ride quality or handling feels to "people in general" (rather than just to themselves and their similar-minded friends), they would pretty much have to conduct a study like this. A study designed by someone who knows how to do it right, involving a statistically significant number of randomly-recruited participants, controlled conditions, etc., etc. Otherwise, we are all just asserting our own preferences with nothing but cherry-picked anecdotal evidence to back it up. Not that there's anything wrong with that, if it's all in good fun. But if you want to be scientific about it? Offer some food for thought to your friendly neighbourhood social psych grad student, and see if they're up for a project!


  1. I LOVE it! Several comments:

    1. What's the time required for the fork swap? Will that be impractical in terms of having the rider stick around for the second ride? And if the gap is too long, will they forget the ride quality of the first ride?

    2. Is 3-4 miles the right distance? I think of all of the terrain that I cover in a single 1.5 mile lap of a CX course. On the one hand, 1/5 miles seems like plenty. On the other hand, might the feel be more similar at the outset but more different after an extended time in the saddle?

    1. I'd say an experienced bicycle tinkerer could swap the fork in 10 minutes. It could be done while they're filling out the questionnaire and sipping coffee, which would take 20-30 mins.

      The milage is really up to the person designing the study. You can certainly do all sorts of maneuvers over a 1/2 mile course, but is it enough to form a lasting impression of the bike? I don't actually know the answer to that, just thinking out loud.

  2. I think your experiment would work better with two different bicycles, identical except for the fork. It would be a lot more practical.
    I'd also like to point out the complexity of the problem you're discussing originates in physics, but is more properly a problem in mechanical engineering, and it is much more complicated than it appears. There's a PhD thesis on self-stable bicycles (see and one of the things that came out from that study is that it's not that easy to describe what makes a bicycle self-stable (that is it will stay up by itself, given a push). If you can't do that, it's easy to see how difficult it would be to investigate rider preferences.

  3. While we're being possibly over-the-top: it needs double-blinding. Should be pretty straightforward: the person explaining the test to the user shouldn't know which group the tester is in. A person who does know which bike this tester is getting places the bike in some sort of start area out of view of the test explainer and the tester. (Out of view of the explainer because they might be able to tell at a glance which bike it is.)

  4. The picture really clenches the nerdiness! :)


  5. I am sure that some of your readers will be far more expert on the subject than I, but I've come across "ABX" testing in the field of music.

    The test subjects are played music using technology "A", then the same music using technology "B", then the same music using technology "X" where "X" is either "A" or "B" selected at random and the subjects are asked if it is "A" or "B".

    So people aren't asked what they prefer, because with music this can be subjective, simply if they can tell the difference between the two different technologies. Without going into the maths to make sure the results are statistically significant, if someone gets "X" right about 50% of the time, it basically means they can't tell the difference between "A" and "B" and they might as well toss a coin.

    Of course, HiFi magazines very rarely actually do this type of testing, firstly because it is expensive and time consuming and secondly, because it can destroy too many sacred cows. They may find people can't tell the difference between an expensive piece of equipment from a manufacturer who spends a lot on advertising from something a lot cheaper. (blind testing did show that people couldn't tell the difference between expensive speaker cables and wire coat hangers)

    So I'm wondering if this could be an alternative way to run the trial. You actually tell people what type of bike they are riding, "A" then "B", then randomly select "A" or "B" and see if a statistically significant number can tell what it is. Maybe the trial could ask people if they prefer "A" or "B", but if they are unable to tell the difference when they don't know what bike it is, does their preference mean anything...?

  6. Ahhh, but a remaining weakness is that you are still only measuring reported preferences. We need hard data. Part of the test track should have path lines painted on the ground. Test subjects will be instructed to follow the lines as closely as possible at normal riding speed. Rides will be videotaped to allow detailed analysis of their ability to follow straight and curved paths accurately. Tests should be repeated both on an unladen bike and one with front and rear loads for comparison. Ideally, test subjects should wear motion capture suits (body suit with ping pong balls at key points) to facilitate importing their movements into motion capture software for more detailed analysis.

    Future trials should add road surface conditions as an added variable to ascertain impacts to ride quality of a variety of road surface textures.

    You're never going to get that grant approved if you don't think big.

  7. OH MAN! I would love to participate! The fact that I'm already pretty biased on this subject could be overcome simply by telling me that Jan Hiene believes one thing and Grant Petersen believes another and I won't know what to believe as I strike out to ride your bikes. There are folks I respect on both sides of this issue although of all the people who have expressed opinions Jan and Grant seem like two of the most reasonable and rational.

    While worthwhile, your test wouldn't change the minds that might need to be changed, but it could (possibly) prevent a few people from forming uninformed opinions that they would then have to defend to the death for the rest of their lives. The cycling world would owe you a great debt for that alone.


    Oh yeah, at what point in the test do you apply the electric shocks?

  8. "Pained attempt at a catchy title...?" That's a great title!

    Also, having little formal education in the sciences, I'm curious how you arrived at 48 participants. I understand the need for the four groups, but why a minimum of 12 in each? (Please don't feel obligated to write a separate essay explaining statistical analysis to me...)

    1. Ha. That's just a jokey reference to how even the most boring studies try to make up "sexy" titles to make themselves more appealing to journals for publication; at least that's how it is in psych and neuroscience research.

      I no longer remember how the 12 min to a group guideline was derived exactly, but it's specific to behaviour/attitude studies where participants do tasks, and then groups are compared. I am sure social psych statistic manuals will have more detail.

  9. Perhaps Max has a point. I think you really need Four bikes permanently set up, two of each so that everybody can see that they have a different bike but will not know if it is actually set up differently. It would also speed up change overs. Is twelve in each group enough to be statistically significant?

    1. It's a judgment call. Using multiple bikes exposes you to the potential criticism that the bikes were not in fact identical. Either way, there is always something.

      n=12 is enough (but the min) for this kind of study

  10. It pains me to say this, but I must take issue with your experimental design. Trial 1 features 4 mountains, whereas Trial 2 only 3.

    Journal of Attitudes and Motion

    1. Your observation is duly noted and appreciated, Daniel.
      Unfortunately I cannot control for erosion between T1 and T2.

  11. This is pretty nicely done! Some suggestions: Definately 2 identical bikes. Cover the forks with something (Paper bags?) to make sure that people can't visually gage the fork offset. Make sure your course includes very tight turns, high speed cornering, stand up climbing or at least a moderately steep climb, and other special parts of cycling where fork offset can make a difference. You may not see much difference in easy "around the block" riding. A lot of the value of this will be determined by the nature of the course, and by your questions. I also like the "ABX" technique!

  12. As a social psychology post-postgrad, I do love this a lot.

  13. I love this so much. There really is very little substantial scientific research done on bicycles, but there are millions of cyclists that could benefit from it, and this experiment is more or less perfect. Double blinding is unnecessary as there wouldn't be a strong participant-researcher interaction unlike a doctor prescribing medication.

  14. If you completly randomized the test you would end up with people who doesn´t actually ride a bike.

    1. Those people could push the bikes - a sub-study ;)

  15. "it's a judgment call. Using multiple bikes exposes you to the potential criticism that the bikes were not in fact identical. Either way, there is always something."

    Assuming the tyres, tyre pressures, and all component were identical I think it'd close enough. Any tiny differences between two mass production frames should be dwarfed by fork differences.

    For masking the fork - a handlebar bag on each bike should do the job. Especially if participants don't know it's a fork test. Not many people would see a different fork bend while walking up to the bike and mounting.

    But I think subjective feelins can be pretty accurate anyway. I fitted a set of tyres to my tourer. After 15 miles I thought SLOW!!!! and harsh handling. Did a few roll down times on a local hill then swapped them for better tyres. There was around 10% difference in the times. After all experienced cyclists notice a cm or less change in saddle height. I'm sure a change in trail would be felt by most riders.

  16. Have a world class racer ride the course and use a computer and servomotor to recreate that speed profile. Which bike allows the average rider to hang on longer before they crash?

  17. You need more than 2 bikes, simply because there may be some undetected difference in one bike - say frame alignment - that can be felt by all participants. These riders might report that Bike A is harder to ride straight, but it's not because of the geometry, but because the frame isn't aligned as well as Bike B.

    So you need to duplicate at least one of the bikes, so the testers ride three bikes, of which two are identical, and they are asked to report on all three. You could either tell them that all three are different, but that might not get you good responses. Better might be to tell them that one is different, and ask them to identify it and tell you how it's different.

    That is when we tested frame stiffness for Bicycle Quarterly. We had three bikes, two of which were identical, and one was different. We didn't have 48 testers - unfortunately, our resources were already stretched by constructing three identical bikes. But then, we didn't intend to make statements about the general population at large, but only show that what we thought we felt in BQ's bike tests was actual, and not just in our heads. Basically, could we tell the "odd bike out" (with either a stiffer or a more flexible frame, we didn't know which) in a double-blind test. Two of three testers could. We deliberately made the differences between the bikes very small - it's possible that all testers could have determined the different bike if the differences between the frames had been larger.

    It's a fascinating subject, and I am glad you pointed out that doing a good study isn't as easy as it may appear at first.

  18. Ambitious and Hilarious at the same time!

    I have to sort of disagree with the idea of handling being subjective!

    The cause and effect as you noted are well known, the differences that come into play are due to the human element, but for other reasons then just subjectivity. Namely, we are not all built the same. A very small rider will feel the bike differently then a very tall one; the builder can minimize the difference by using smaller wheels on smaller bikes and larger for lager bikes, but this is still only part of the equation. There are several different body types as well! I for instance have a longish torso with not so long legs, my buddy on the other hand has extremely long arms and legs and while his torso in not short it's not as long as mine proportionally to the his arm/leg length. Still others probably have arms, legs and bodies that are more evenly proportioned. This effects bicycle set up which can have a massive effect on the handling or "Feel" of the bike! A REALY GOOD experienced builder could probably build a bike for each individual person that would handle the same, but the average off the shelf bike by necessity is going to be a generalization, a compromise!

    This is part of the reason I have never left a bike the way it was when I bought it! Invariably I have to change the stem (Longer), Sometimes the seat post (different offset)!

    I cheer your enthusiasm, but I fear it might be misplaced!?

  19. I very much doubt you will get statistically significant results even with 48 testers. It will come down to the questions. There are experts who may able to help, but the questions should get a 1-5 or 1-10 result if possible. "Rate on a scale from 1-10..." The person will always answer differently on the second ride because they will know the questions you asked the first time.

    Here's a better idea:

    Develop a webform that queries hundreds of people on the handling trait in question. Ask them for their measurements bike size, bike make and model/year. Cross reference their results with the trail data from their bike model. So you're asking everyone about their own bike. You use public data to map the trail stat to the bike model the survey-takers give you. You can also find pockets of data around their demographic (bars higher than seat, user height, user weight, improperly fit bike, etc etc).

    1. You are working under the impression there is a lot of variety in trail for production bikes. There is not. Production bikes are tightly grouped around 57mm trail. More variance on the high side than on the low side. Low trail bikes are Bromptons, cycle trucks (and not all of those) and customs. Bikes that are even sorta lowish trail just aren't common. Experimenting with low trail for the masses is not something a large manufacturer should do.

  20. Or use two frames. Let half the riders use them then switch the forks between the bikes for the second half.

  21. Not to be a pill.....

    I wonder if you could emphasize quantifiable objective outcome variables? A questionnaire will (hopefully) capture how a rider feels on a bike, but issues of power transfer and stability would seem to be compelling.

  22. This sounds like you're opening yourself up to the "Pepsi Challenge Effect" where there might be a preference for one style for a short time (the sweeter Pepsi), but for a greater duration, the other (less sweet Coke) might be preferable, whether due to effort/attention required or other differences that aren't perceived in a brief, non-real world usage scenario.

    Better make it a century :)

  23. I would add a request that the tester rides each bike no-handed (at there own risk of course) and comments on the ease of so doing.

  24. I did some test rides this week on a prototype/preproduction sample citybike. After sitting in a box for 6 years it was given to a friend of mine by an Industry guy. My friend knows a few things about bikes, he opened a bike shop in 1964, worked there every day 'til '74, remained a partner in the shop into the mid-90s. He said it was the worst handling bike he'd ever had and wondered if I could discover why.

    One glance told me it was high trail. Measurement made it 71 degrees head and 43mm of rake. Very high trail numbers. But it rode nothing like high trail. It darted, hunted, and wandered. Above 10mph it was queasy. One and only one attempt to give it a little gas made the whole front end shake at about 15mph. The headset seemed good. It passed my basic alignment checks and had already been checked at a bike shop. Hmmmn. There were three things clearly wrong with the bike. The handlebars were just too wide, and more suited to a MTB than a city bike. They couldn't be shortened as there was a bend just past the large grip/brake/shifter assembly. Then the seat tube was very steep. It measured 77 degrees. The saddle nose was almost directly above the BB. Close enough there was little reason to measure. Finally the toptube was very short. With the moderne frame design and highly shaped tubes I couldn't even figure out where to measure. It was short. Finally let me say there was no flex issue in the usual sense. With oversized and shaped aluminum tubes this frame was ready for a 400# rider.

    So I went and bought a layback seatpost. The OEM saddle had almost no possibility of adjustment, I pulled a suitable alley find from the parts heap. The original saddle was round filed. Now we had about 6-1/2cm of saddle setback. The owner liked the way the bike rode. I found it acceptable. It was queasy still above 15mph but I was able to do about 20 without that violent shake. Good enough. It still doesn't ride anything like a high trail bike.

    Industry guy told us that while this exact model was not produced the same frame without the expensive to fabricate rack and with slightly lesser componentry had sold well at $1100. Owner satisfaction was high and consumer complaints were notably low. Two lower end bikes had used identical geometry. The reviews for all of these bikes had been stellar, even at the publications too small to think about ad revenue. His diagnosis was my friend was just getting old. That one didn't go over so well with me or my friend.

    How to explain high user satisfaction with a bike that was just terrible? The bike was only barely acceptable below 10mph and people liked it anyway? Nice paint and nice features and a name brand trump handling?

    I have an untestable theory for why the bike behaved so badly. The steer tube was about a yard long. The head bearings while free-turning without any looseness were a novel design. I couldn't even figure out where the balls would have been. It could have been running in bushings for all I could tell. The head bearing seemed fine in the stand but who knows what happens with a novel design in real use. I think there was a resonance in that long long steer column. No mechanical engineer worth his paycheck would mess with something as simple as a threaded cup-and-cone HS, a system that has delivered stellar service for 125 years. This bike was designed by marketeers out of control.

    The basic takeaway here is looking at the trail number doesn't tell you much. Bikes are more complex than that. Center of gravity issues are just as important as trail. Biomechanics setup is just as important as trail. And all factors are well interconnected. When in doubt stick with the simple and the proven. And go for a bike that looks mechanical, not one that looks like it was designed in the art department and in marketing

  25. Along with being a stealth reader of this magnificent blog, I am a university professor who helps to direct an Honors program at a midsize university in Florida. I would be happy to get some of my Honors students working on this if we could come up with a little bit of funding and/or a couple of donated bikes. My campus has a bike shop and bike rental program, so we have all the tools and expertise we would need.

    1. Oh very nice! Get in touch with me over email if you want to discuss it.