How can I identify and fix errors in data entry, outliers,
Response Guidelines
Provide a substantive contribution that advances the discussion in a meaningful way by identifying strengths of the posting, challenging assumptions, and asking clarifying questions. Your response is expected to reference the assigned readings, as well as other theoretical, empirical, or professional literature to support your views and writings. Reference your sources using standard APA guidelines.
Peers have responded to:
Data Screening
For this discussion, identify the goals of data screening. Then discuss how you can identify and remedy the following:
Errors in data entry.
Outliers.
Missing data.
PEER RESPONSE 1
Cait Bahr
Warner (2013) discusses the many goals of data screening in Chapter 4 of Applied Statistics from Bivariate Through Multivariate Techniques. Most importantly, data screening is used to monitor problems that may occur in data analyses (Warner, 2013). In data screening, it is necessary to identify all errors in data collection, otherwise, the results and data will be skewed and therefore incorrect. Other goals of data screening include correcting data errors, determining what data is missing and finding extreme outliers (Warner, 2013). Data sampling needs to be addressed in most data entry because there are significant problems that can occur when processing data (Warner, 2013). Some notable problems with data include extreme outliers, missing data, errors in data entry, small sample sizes, and nonlinear relationships in the data (Warner, 2013).
When identifying and remedying issues in data entry it is incredibly important to fix errors in data entry, locate outliers and correct any missing data (Warner, 2013). In order to address errors in data entry, all of the data must first be completely proofread (Warner, 2013). To ensure that the correct data is being entered, it is wise to compare the data being entered to the original data from the experiment (Warner, 2013). Additionally, the numbers being entered should be also compared to logbooks in addition to being checked by multiple individuals (Warner, 2013). In data entry, outliers are normally numbers that are either too high or look unusual (out of place) (Warner, 2013). So outliers do not skew or create false data, many researchers remove outliers prior to data entry (Warner, 2013). Lastly, when there is missing data there are many ways to replace data (Warner, 2013). For one, SPSS allows researchers to find missing data in their application, system missing values (Warner, 2013). Rather than counting the missing data as a form of data, this software instead replaces the missing data with an estimated value or excludes the missing data completely (Warner, 2013). Regardless of how the missing data is corrected, researchers are suggested to report all missing data in their research findings summary (Warner, 2013).
Reference
Warner, R. M. (2013). Applied Statistics From Bivariate Through Multivariate Techniques (2nd ed.). Thousand Oakes, CA: SAGE Publications.
PEER RESPONSE 2
Teddrick McCreary
What are the goals of data screening? When we are working with numbers and data, there can be many different errors, measurements, inconsistencies, missing values or outliers. It is important that when we are researching information regarding any population, we need to run preliminary analyses with the data collection (Warner, 2013). For myself, I found it easier to understand by means of: data screening is the way to inspect the data collected for any errors and correct those errors before the final analysis. This is done by checking over the data, recognizing any outliers, and then dealing with any missing data.
How can I identify and fix errors in data entry, outliers, and missing data?
• Errors in data entry: That really can be avoided by knowing more about where you are getting your information, any experiments, and being aware that there may be errors (Warner, 2013). For myself, with the fact that I collect much of my data through anecdotals. So I am in the classroom, observing and collecting data on frequencies of behaviors, new behaviors, or types of behaviors. One of the main factors of error in my data entry would be the issue of reactivity. Meaning, since I am there and they are aware of my presence, they may be acting different because I am there. This does happen because they can be very happy that I am there and act differently because of that, or on the other hand I could be making them more anxious then they are normally. Either way, this is going to give me different numbers than if I were just to leave it to the staff to collect and document the data I need.
• Outliers: An outlier is data that is extremely different than what others are in the same sample, population, or individuals being surveyed (Warner, 2013). An outlier can be significant due to alerting the researcher that there is an abnormality or an error in the measurements that one is working on. When we find that we have an outlier, we then need to decide if we want to keep or take out the outlier. There are sometimes in which it would be ok to not add the outlier; however, there are times in which this outlier could be showing us a new trend or discovery in the information that we are looking at.
• Missing Data: Typically, we are not going to find the missing data until we enter the information into SPSS. Once we have entered the data, then there are two types of missing data, system missing and human error (Warner, 2013). We can do somethings with missing data. One thing we can do is just leave it alone. The missing values could be small, non-random, or something that we could just create a composite of the item by averaging them together into a new variable. One thing that we have to keep in mind is that with this missing data, SPSS with either use a listwise or pairwise deletion. Another thing we could do is delete the cases with the missing values, this way we have complete data for all of the subjects in the study (Psychwiki, 2011).
References
Psychwiki. (2011). Dealing with Missing Data. Retrieved from Psychwiki.com: http://www.psychwiki.com/wiki/Dealing_with_Missing_Data
Warner, R. M. (2013). Applied Statistics: From Bivariate Through Multivariate Techniques (2nd ed.). Thousand Oaks, CA: Sage.
My Initial Response:
Data Screening
The goals of data screening include:
To check for outliers
To check for missing data
To check for errors in data entry
Outliers can be spotted by either plotting a box/plot or a histogram of the variable. They can be corrected by omitting them in the analysis.
Missing data can be identified by exploring through your data set before commencing analysis. You can opt to delete the missing cases all together or fill in a value like the mean if deletion will result in sample size quality.
Errors in data entry can be spotted by either exploring your data set before analysis, counter checking with the source data or plotting a histogram and checking for inconsistent values. This error can be avoided by one been keen using data entry or using digital data capture media.
References
Warner, R. M. (2013). Applied Statistics From Bivariate Through Multivariate Techniques (2nd ed.). Thousand Oakes, CA: SAGE Publications.