What influence does it have on psychological testing? Reliability refers to the consistency of a measure. For example, if a test is designed to measure a trait such as introversion , then each time the test is administered to a subject, the results should be approximately the same.
Unfortunately, it is impossible to calculate reliability exactly, but it can be estimated in a number of different ways. Test-retest reliability is a measure of the consistency of a psychological test or assessment. This kind of reliability is used to determine the consistency of a test across time. Test-retest reliability is best used for things that are stable over time, such as intelligence.
Test-retest reliability is measured by administering a test twice at two different points in time. This type of reliability assumes that there will be no change in the quality or construct being measured.
The test-retest method is just one of the ways that can be used to determine the reliability of a measurement. Other techniques that can be used include inter-rater reliability, internal consistency, and parallel-forms reliability. It is important to note that test-retest reliability only refers to the consistency of a test, not necessarily the validity of the results. This type of reliability is assessed by having two or more independent judges score the test. One way to test inter-rater reliability is to have each rater assign each test item a score.
For example, each rater might score items on a scale from 1 to Next, you would calculate the correlation between the two ratings to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observation falls into and then calculate the percentage of agreement between the raters. Parallel-forms reliability is gauged by comparing two different tests that were created using the same content.
The two tests should then be administered to the same subjects at the same time. This form of reliability is used to judge the consistency of results across items on the same test. When you see a question that seems very similar to another test question, it may indicate that the two questions are being used to gauge reliability. Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.
There are a number of different factors that can have an influence on the reliability of a measure. First and perhaps most obviously, it is important that the thing that is being measured be fairly stable and consistent. Aspects of the testing situation can also have an effect on reliability. For example, if the test is administered in a room that is extremely hot, respondents might be distracted and unable to complete the test to the best of their ability.
Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of responses and surveys. Instead, we have to estimate reliability, and this is always an imperfect endeavor.
Here, I want to introduce the major reliability estimators and talk about their strengths and weaknesses. There are four general classes of reliability estimates , each of which estimates reliability in a different way.
They are:. Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent.
People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret. So how do we determine whether two observers are being consistent in their observations? You probably should establish inter-rater reliability outside of the context of the measurement in your study. There are two major ways to actually estimate inter-rater reliability. If your measurement consists of categories — the raters are checking off which category each observation falls in — you can calculate the percent of agreement between the raters.
For each observation, the rater could check one of three categories. Imagine that on 86 of the observations the raters checked the same category.
The other major way to estimate inter-rater reliability is appropriate when the measure is a continuous one. There, all you need to do is calculate the correlation between the ratings of the two observers. For instance, they might be rating the overall level of activity in a classroom on a 1-to-7 scale. You could have them give their rating at regular time intervals e. The correlation between these ratings would give you an estimate of the reliability or consistency between the raters.
For instance, I used to work in a psychiatric unit where every morning a nurse had to do a ten-item rating of each patient on the unit. Although this was not an estimate of reliability, it probably went a long way toward improving the reliability between raters. We estimate test-retest reliability when we administer the same test to the same sample on two different occasions.
This approach assumes that there is no substantial change in the construct being measured between the two occasions. The amount of time allowed between measures is critical. We know that if we measure the same thing twice that the correlation between the two observations will depend in part by how much time elapses between the two measurement occasions.
The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time — the closer in time we get the more similar the factors that contribute to error.
Since this correlation is the test-retest estimate of reliability, you can obtain considerably different estimates depending on the interval. In parallel forms reliability you first have to create two parallel forms. The split-half method is a quick and easy way to establish reliability. However, it can only be effective with large questionnaires in which all questions measure the same construct. This means it would not be appropriate for tests which measure different constructs.
For example, the Minnesota Multiphasic Personality Inventory has sub scales measuring differently behaviors such as depression, schizophrenia, social introversion.
Therefore the split-half method was not be an appropriate method to assess reliability for this personality test. The test-retest method assesses the external consistency of a test. Examples of appropriate tests include questionnaires and psychometric tests. It measures the stability of a test over time. A typical assessment would involve giving participants the same test on two separate occasions.
If the same or similar results are obtained then external reliability is established. The disadvantages of the test-retest method are that it takes a long time for results to be obtained. Beck et al. The timing of the test is important; if the duration is to brief then participants may recall information from the first test which could bias the results.
Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results. This refers to the degree to which different raters give consistent estimates of the same behavior. Inter-rater reliability can be used for interviews. Note, it can also be called inter-observer reliability when referring to observational research. Here researchers observe the same behavior independently to avoided bias and compare their data.
If the data is similar then it is reliable.
0コメント