Individual Contribution to Group Work (Help Needed)

Group work seems quite important to the learning process, and yet, at least in my sample size of 1, it is unfortunately rare during schooling. A reason for this seems to be that grading group work is difficult and often unfair: it is difficult for a grader to see into the group mechanics to discern who is contributing, and yet giving all group members the same grade does not capture that few students do a substantial portion of the work, while the others free-ride.

It should be possible to solve this problem if enough group assignments were done over a period of time, and students were randomly assigned to their groups. The idea would be to evaluate a student’s contribution to a groups grade by looking at performance across multiple groups.

Here’s sketch of how this might be possible:
Each group assignment is graded, say from 1-100.
A student’s grade would then be calculated by looking at the average grades of the projects for that student’s groupmates, and seeing whether this project does better or worse.
So, if the average of a students’ groupmates’ project grades is 85, and this group’s project grade is 90, we attribute the overperformance to a student’s influence.
The question is, what do the groupmates tend to earn without a student in their group, and what do they tend to learn with that student in their group.

Perhaps it will be easiest to work through with some sample data. Let’s say there are four students, and groups of two.
Students: A, B, C, D
Project 1:
AB - 95%
CD - 75%
Project 2:
AC - 90%
BD - 80%
Project 3:
AD - 85%
BC - 85%

For student A, we could average together all projects that she worked on, (.95+.9+.85)/3 = 90%
And we could average together all projects that she didn’t: (.75+.8+.85)/3 = 80%

The number I’m interested is, I think, the difference between those averages. It may not matter here, and it may not be more meaningful than the raw average, but in a larger class, with larger groups, where not every combination of groups can be tried, can’t we approximate the influence of an individual on a group by comparing this difference? Unfortunately, my knowledge of statistics is thin, but I’m looking to measure a difference across several projects that better captures A’s contribution to them.

Any insight would be helpful.

I’ll work on writing a script to generate some more interesting sample data – including an explicit ‘contribution amount’, so that we can test how well an algorithm approximates it.

Also, I should say that I am interested in reinventing the wheel a bit here. I’m sure this problem is solved, but solving it is more fun than just reading the answer.

Trying to judge individuals based upon a groups performance. That kind of thinking is so vulgar to me, I don’t even want to participate.

It can be framed less vulgarly: we’re trying to extract individual performance using the performance of groups as a dataset. In that sense, it’s actually removing what I think is the root of the vulgarity. The same technique could be used in companies, combat units, surgical teams, political parties, etc. Isn’t that a worthy cause?

It is a formula for massive deceit and thus would catch on really quickly.

Correlation is NOT Causation

The sample size is probably too small to produce a statistically valid result.

You’re not going to randomize a combat team or a surgical team or a company team. It doesn’t work that way.

Isn’t this equally an argument against all grading?

I agree, and I’m sure there’s a way to show this (though I don’t know what it is). But for some class of size n broken into groups of size k, there’s a number of projects p for which the result becomes statistically significant. After all, for some n,k,p, every possible combination of groups has been tried, and then wouldn’t we be determining individual contribution as reliably as we can for individual assignments?

I have an intuition that in some cases where randomization is rejected out of hand, it would actually be beneficial, and not just because we could then better evaluate individual performance. You’re probably right about combat and surgical teams, but I think company teams would general benefit from regular shuffling (but it will depend on the company and the nature of the business).

An effective team is not going to be broken up.

What will happen is that there will be some standard for measuring performance. The over-performing teams and under-performing teams will be identified. Someone will investigate the differences between the two. The under-performers will be trained to adopt the behaviors of the over-performers.

People can’t just be ‘shuffled’ and manipulated like objects.

There is a practical limit on the number of group projects that a individual can do in a year/semester. If you are doing a group project, then you are not doing an individual project. And an individual project measures your effort/skill/knowledge directly. It makes no sense to have a ‘large’ number of group projects unless there is some kind of scarcity of resources.

“every possible combination” quickly blows up: … ations.php

Under no circumstances will anyone with a intelligent child allow this sort of chaos effect their child’s grades.

We have a system that collectively grades already in school, it’s called sports, and they have coaches paid by the community. Your exposing your future Harvard Graduates, seeking to graduate with Honors, to being cograded with a future idiot who will drop out in the 11th grade, and will open up a Meth lab in their own kitchen.

It us deeply, deeply rotten and unfair.

You want to get them working together as s group, then set up a obstacle course in the gym, than can’t be completed without cooperation. Don’t grade it, just let them do it. The MENSA student might get trapped in a maze because her teammates can’t solve how to open the latch on the opposite side, but at least she can go to a good college after the experience as it’s not embedded in her permanent record.

Don’t put the smart at the mercy of the stupid. Its Tyranny of the Majority Territory. Our democracy has always managed to balance group democracy without unnecessarily hindering personal advancement, for the most part. Its not time to start chaining our best to the stupidity of our worst. We want our best to succeed, untarnished, unblemished.

Oh, for heaven’s sake.
Science has this concept called “falsification”. Basically it means that the results could not have occurred if the hypothesis was not true. When you have an individual take a math test (for example) with 10 independent questions, reasonable security against cheating, and the individual gets all of the questions right, it is certainly reasonable to accept that the person knew how to do the math and thus performed.

What you are suggesting is:
“You know, we didn’t have all of these disease problems until that Jewish family moved into town”.

If you can’t provide reasonable falsification (doesn’t have to be perfect), then DON’T JUDGE PEOPLE!!!

If you work with PBL in th in the classroom, you can actually walk around and get a sense of individual participation and roles. You can further ask the students to evaluate their own and other people’s participation and contributions. And certainly seeing what happens over time in a variety of groups should help. Numerical evaluation is always going to look arbritrary because it is ill suited to evaluating a human, but even with this silliness embedded in the pedagogical process - there are of course progressive and other exceptions - one can still come up with a decent approximation.

But we don’t make kids do projects in order to evaluate them, we make them do projects in order to teach them. If group learning is actually more like what will be demanded of people when they graduate into the work force, it’s likely that group work in school will better prepare them for it. We can either have them do individual work, which will be a worse education, or we can figure out how to fairly grade group work.

Yes, but can we have a statistically significant measure of contribution before every possible combination? I mention “every possible combination” only to say that we know that at the maximum, we have statistical significance. But what’s the minimum?

This method seems quite subjective, and in some settings it doesn’t sufficiently control for implicit biases. For me, starting in high school teachers started grading anonymously in order to avoid favoritism or discrimination. With the system I’m proposing, that could still be done.

But probably a certain degree of subjective input would make the system better. There is an anecdote about the Israeli army using quantitative methods to place recruits, instead of the traditional soft evaluations. This change improved outcomes, but left those making the placements feeling cold. They added subjective evaluations back in as one of many quantified criteria, and outcomes improved further. So perhaps best would be for an individual’s grade to be some composite of their effect on group project outcome (what I’d like to measure) combined with the subjective evaluations of their peers and instructors.

If the purpose of group work is to teach, then why are you looking for a way to “fairly grade group work”??
They learned by doing the group work. The grade is irrelevant.

I know that this is America and the quarterback or pitcher gets almost all the credit for a win/loss…
But doesn’t a team share the glory/humiliation among the members … even when some perform poorly or exceptionally?
There is no ‘I’ in team… right?

It’s been a while and I would have to research it. My intuition tells me that the number of projects required would be too many to be practical.

:-k Some stuff you learn from team/group work :

  • some people will take advantage and let others do work which they are capable of doing

  • some people are not capable of doing the current work, although they may be capable of doing some other work

  • some people are distracted by personal issues

One concludes that sometimes you have to carry the other members of your team and sometimes they carry you. You can’t put a grade on that.

As you point out, “this is America”

The goal is to teach, but we also want to be able evaluate learning and ability. It seems like the ideal system would be one that chooses the best pedagogy for learning, and maps an effective method of evaluation on top of it. I am under the impression group work is a better pedagogy than individual work (or at least that mixed group and individual work is better than individual work alone).

I think people say this more as a way to motivate team members. Oneness with the group is a powerful mindset, and it enables many actions that aren’t justified on a purely individualistic grounds. But there are still individuals, the best players can still be more or less reliably identified.

I’m worried this might be the case, but I’m hopeful too. We have a few variables to play with: n (the size of the class), k (the size of the group), and p (the number of projects).

We also have the option of using non-random groups to get better information. One way might be to make it random with restrictions, (e.g. random for first project, no overlap between first project and second project group, no overlap between second and third project groups, etc). Another might be to intentionally group together or keep apart certain students based on their grades on previous projects, in order to distinguish among them. I might be that putting the best students with the worst students will tend to differentiate them more, or that intentionally balancing the groups will tend to show which students are pulling extra weight and which are not.

I’ve written a short program to generate test data, written in python. I’ll add it here if anyone wants to play with it or mock me for how bad I suck at programming.
[tab][code]import random

students = list()

Create 25 ‘students’, each a two item list:

- An index number (as an identifier), and

- A random integer between 0 and 100 (their ‘contribution’)

for i in range(25):

Print the students, number 0 to 24 (python starts list indexes at zero).

for i in students:
print ((‘Student %s’ % str(i[0])) + ": " + str(i[1]))

projects = list()
projStudents = list()

Create 10 projects, in which the students will be groups randomly into groups of 5.

for i in range(10):
for j in students:
# Each project will generate a list of groups with the scores they receive.
grouplist = list()
while projStudents:
group = list()
# There has to be a better way to do this, but this is a quick and dirty way to pick 5 random students.
group = [projStudents.pop(),projStudents.pop(),projStudents.pop(),projStudents.pop(),projStudents.pop()]
grade = 0
for j in range(5):
grade += group[j][1]
# This is a roundabout way to get the average contribution, which is the group’s grade.
grade = grade/5
# The grade is added to the group as the 6th value
# Add this grouplist to the projects list, and go to the next project.

print (‘\nFinal outcomes : \n’)
x = 0

for i in projects:
# Print the projects, number 0 through 9
print ((‘project %s’ % x) + “:”)
for h in i:
for idx, item in enumerate(h):
if idx != 5:
print (printlist)
print (‘Score = ’ + str(h[5]))
print (’')
And here’s some sample output:
[tab][code]Student 0: 86
Student 1: 91
Student 2: 8
Student 3: 10
Student 4: 15
Student 5: 33
Student 6: 82
Student 7: 17
Student 8: 99
Student 9: 18
Student 10: 24
Student 11: 20
Student 12: 95
Student 13: 35
Student 14: 90
Student 15: 91
Student 16: 35
Student 17: 99
Student 18: 40
Student 19: 13
Student 20: 25
Student 21: 8
Student 22: 3
Student 23: 15
Student 24: 80

Final outcomes :

project 0:
[16, 8, 2, 7, 9]
Score = 35.4
[18, 14, 23, 6, 1]
Score = 63.6
[17, 15, 10, 24, 4]
Score = 61.8
[0, 19, 13, 12, 3]
Score = 47.8
[11, 5, 20, 21, 22]
Score = 17.8

project 1:
[21, 8, 13, 5, 3]
Score = 37.0
[16, 12, 6, 19, 17]
Score = 64.8
[0, 24, 1, 22, 15]
Score = 70.2
[10, 7, 11, 4, 9]
Score = 18.8
[18, 23, 2, 20, 14]
Score = 35.6

project 2:
[11, 2, 20, 18, 5]
Score = 25.2
[21, 14, 12, 10, 19]
Score = 46.0
[8, 24, 1, 6, 0]
Score = 87.6
[7, 22, 13, 23, 16]
Score = 21.0
[17, 9, 15, 4, 3]
Score = 46.6

project 3:
[11, 1, 20, 10, 2]
Score = 33.6
[14, 17, 22, 24, 12]
Score = 73.4
[0, 5, 13, 19, 23]
Score = 36.4
[16, 21, 15, 8, 7]
Score = 50.0
[18, 4, 6, 3, 9]
Score = 33.0

project 4:
[1, 0, 24, 21, 22]
Score = 53.6
[2, 8, 16, 18, 23]
Score = 39.4
[12, 6, 7, 9, 17]
Score = 62.2
[11, 15, 3, 10, 5]
Score = 35.6
[4, 19, 13, 14, 20]
Score = 35.6

project 5:
[11, 15, 13, 7, 10]
Score = 37.4
[16, 5, 9, 4, 12]
Score = 39.2
[2, 18, 22, 3, 0]
Score = 29.4
[24, 1, 20, 21, 8]
Score = 60.6
[14, 19, 23, 17, 6]
Score = 59.8

project 6:
[23, 12, 3, 6, 20]
Score = 45.4
[14, 15, 2, 24, 16]
Score = 60.8
[19, 9, 22, 11, 8]
Score = 30.6
[21, 13, 7, 0, 18]
Score = 37.2
[4, 10, 17, 1, 5]
Score = 52.4

project 7:
[16, 5, 17, 13, 3]
Score = 42.4
[23, 6, 21, 2, 1]
Score = 40.8
[7, 20, 12, 8, 14]
Score = 65.2
[15, 0, 19, 11, 4]
Score = 45.0
[10, 18, 24, 9, 22]
Score = 33.0

project 8:
[15, 24, 23, 21, 5]
Score = 45.4
[18, 19, 4, 14, 16]
Score = 38.6
[13, 8, 1, 3, 17]
Score = 66.8
[0, 12, 7, 10, 9]
Score = 48.0
[6, 11, 20, 22, 2]
Score = 27.6

project 9:
[23, 13, 21, 4, 19]
Score = 17.2
[24, 9, 12, 10, 5]
Score = 50.0
[15, 22, 16, 8, 6]
Score = 62.0
[17, 11, 18, 3, 1]
Score = 52.0
[2, 14, 0, 20, 7]
Score = 45.2[/code][/tab]
This output lists 25 students (number 0 through 24) and their ‘contribution’ score. Grades are just the average of contribution score (that’s a gross simplification, but useful for present purposes). Then it lists the 10 projects, numbered 0-9, showing the students in each group and then the group score (which should be the average of those student’s contribution scores.

It’s an open question whether this models actual group interaction, but my approach right now is to see how much I can determine from a simple model, and then add complexity and tweak the solution to account for it.

The usefulness of this simple model is that we know what each student’s contribution score, and we can see how well we can approximate that score looking only at their group scores. If that’s possible, that should alleviate some of the complaints voiced here so far that looking at group scores alone is ‘vulgar’ or ‘unfair’. If we can approximate an individual character trait well while looking only at group work, we can fairly evaluate individuals without sacrificing pedagogy.

Based on what?

But the best players are not found by looking at the team result, but rather individual performance within a set of games. Looking at team results is much too coarse.

That would invalidate a statistical approach since it introduces a bias.

How did you model student ability and results?

One would expect the students to display ability in a subject which should be a normal distribution.

One would also expect the students to display an ability to work within a team … which I tend to think is also normally distributed.

How ability and team work ability is related - that I don’t know. :-k

Let me propose this scenario :
Student A individually gets 50% in a subject.
Student B individually gets 30% in a subject.

They do a project together and the mark is a 50%.

What does this mean for their individual performance within the group?

Did Student A do all the work?
Did Student A tutor Student B and now Student B has a 50% level of understanding?
Was there effective teamwork? Is 50% a sign of good or bad teamwork? Student B achieved better than expected but Student A only achieved the usual.

How much should the marks of Student A and Student B be adjusted?

Mostly anecdote. It seems that most of adult life is more like group projects than individual assignments, and that the skills group work builds are more important those that individual work builds. It also seems that group work requires most if not all of the skills that individual work requires, while individual work does not require the skills that group work requires.

I admit it’s a weak basis, but I’m OK with that and still find the problem of allowing for mostly-groupwork schooling to be worth solving.

This must be in part because of the way sports tend to work. Even in little league, teams are fixed and the same group plays against many opponents. If we had data from how teams perform when individuals are swapped around within them, we could use that data to evaluate the individuals – and I believe this is done in professional sports, where players are regularly traded. It should also be possible to judge individual contribution buy looking at whole-team performance while a player is on the field vs. when that player is off the field in games where players are rotated, like soccer or basketball.

It would have to be done carefully, but I don’t think by itself it would invalidate any approach. Fortunately, with the mocked-up data set and data set generator program, we can test the hypothesis by first using a fully random group selection process, and then comparing it against a partially random or rule-based selection process.

The model is very simple right now, it simply takes the average of the contribution scores for a group. In reality, people’s contribution will depend in part on who else is in the group. A very individually capable person may lose it if they’re in a group with a bully or a crush, etc. etc. But that’s just noise. Again, at some number p of group projects, we know we’ll get reliable individual data, we just need to find out if p is a number that could be accomplished in a reasonable amount of time.

I don’t know. How do Student A and Student B do on the next project? I don’t think that one can pull out individual information from a single group data point. But lets say we have 50 projects, and we know that A never gets less than 50%, and B never gets less than 30%, and when A works with any other student who sometimes gets less than 50%, the group gets 50%, and when B works with any other student who sometimes gets less than 30%, the group gets 30%. Then can we reach a conclusion about their individual contributions?

Your scenario does raise the interesting point that the outcome will depend on the statistical model we use, and that a solution will likely fit the data better or worse depending on the model that’s producing the data. However, this seems to be a general problem with evaluation, it’s just not as apparent when it’s individual work (indeed, it might be that group work compounds the problem with the added uncertainty of contribution).

If you insist on a brute force method, I suggest that you start with 4 people working in groups of 2 - that’s only 6 unique teams and 3 projects.

For example, students 1 to 4 have the following individual performance out of 100%
1 : 25%
2 : 40%
3 : 60%
4 : 75%

If the teams produce a result which is the average of the individual scores then these are the results:
1,2 : 65/2=32.5
1,3 : 85/2=42.5
1,4 : 100/2=50
2,3 : 100/2=50
2,4 : 115/2=57.5
3,4 : 135/2=67.5

Then if those results are averaged for all the groups of a particular student :
1 : 32.5+42.5+50 /2 =41.7
2 : 32.5+50+57.5 /2 =46.7
3 : 42.5+50+67.5 /2 =53.3
4 : 50+57.5+67.5 /2 =58.3

How does this relate to the individual performance that we started with??

Carleas, you can test your own methods by merely inventing some proposed actual performance measurements and then apply your proposed theory to reveal individual performance and see if you come up with the same performance numbers.

But basically, you are doing the original sin thing that has cost all of Mankind ALL of its troubles: Over extending your reach in an effort to remotely control too much. You want to use a few simple high order group numbers to judge and peg individuals associated with the group. That is no different than England wanting to control the Colonies and every other travesty of justice for thousands of years. It is sociopathic.