Social Stratification
in Eastern Europe After 1989:
General Population Survey
Click on a country name below, or browse the file.
Study Design
[These reports are as yet only partly edited to improve the English. In addition, some additional information has been requested but not yet received. Any errors or omissions discovered by readers should be reported to Treiman.]
Field Work Report (by Tsvetan Markov)
A. SAMPLING
The SSEE survey in Bulgaria was done by a two stage cluster sample with stratification by district, municipality, and size of voting section. This is the mixed sampling procedure that combines advantages of stratified and of cluster sampling.
The sampling unit at the first stage was a "voting section". There are 12,500 such sections in Bulgaria with an average size of 540 voters. The list of the sections was stratified by 31 districts and within the districts by municipality and size of the section. Random selection of 300 voting units (clusters) was done within each district in accordance with the size of the population within the district and within the section.
The sampling unit at the second stage of sampling was the individual (address). For this purpose we used the Bulgarian population register (known as ESGRAON). The second stage of sampling was carried out by the Central Statistical Office. Individuals were randomly selected from the population register. To avoid any periodicity in the lists of voters, subsamples of the lists were randomly sorted by different criteria--address, name, individual ID number (ESGRAON), and number in the voters' list. We received 6500 addresses in 298 clusters from the Central Statistical Office. From two clusters--the small villages Bersin (41 voters) and Sini Vrah (59 voters)--we could not obtain persons because of the very small size of the voting units.
B. DATA COLLECTION AND RESPONSE RATE
The field research began on June 1, 1993, and ended, outside Sofia, on July 31, 1993. At this stage of the survey we had 4,600 completed interviews. About 200 interviewers participated in the data collection.
The survey in Sofia started badly: the response rate decreased to 45% and on July 15 I stopped the field work in Sofia. There were two main reasons for our problems in Sofia. The first one was the bad season for interviewing. The second one was that we had to do 900 interviews in Sofia that required about 50 well motivated interviewers. At the beginning of September, the rest of the interviews in Sofia were done. For this wave of the fieldwork I used only the best interviewers.
From 6,109 addresses that were visited (see Appendix A of the Bulgarian report we obtained 4,921 completed interviews--a response rate of 81 per cent. In my opinion, this is a very good result keeping in mind the sample size and the summer time, which is not very suitable for data collection.
The response rate differed significantly by region - from 66 per cent in Sofia to 90 per cent in the Razgrad and Haskovo districts. During the field work I had to over-sample the districts with the lowest response rates (a total of 613 supplementary addresses). The over-sampling was done mainly in Sofia (in 11 clusters) and in Kurdzali (in five clusters). By type of locality the response rates were: 65.6 per cent in Sofia; 82.2 per cent in the six largest cities in Bulgaria (excluding Sofia); 78.9 per cent in the remaining district centers; 84.9 per cent in towns, and 86.3 per cent in villages.
The number of interviews by district is shown in the table 1.
Census means the percentage of the total population living in the respective district, according to the census of December 1992. (Data from the 1992 Census are currently available only for the total population.)
Table 1.--Comparison of Distribution of Survey Responses with Distribution of Population.
The data show that some districts are over-represented. These are districts with Turkish population (KURDZHALI, HASKOVO, SHUMEN) and Sofia. The Turkish population has been over represented in the sample by about 3.0% at the level of clusters (at aggregate level). Another districts such as Burgas, Vratsa and Dobrich have been under represented. The most significant is the under-representation of some "RED" regions - regions of predominantly red (socialist) vote, and mainly of the couple VRATSA-MONTANA.
My experience with this type of samples shows that over and under representation of some districts does not affect results at the aggregate level. But to be precise, it may be appropriate to weigh the sample. I will know whether I want to weigh the sample only after I have seen data about education, ethnicity and voting from the survey.
B1. RESPONSE RATE BY STATUS
A county specific variable "STATUS IN THE SAMPLE" was included in the appendix "A". The average size of cluster was 21 names. The interviewers were allowed to visit first 17 or 18 addresses (STATUS OK). In this way response rate was limited on 82%. The last four addresses were reserves (STATUS RESERVE). The interviewers had right to visit these addresses when the respondent from the main list is not available for interview (due to illness, change address; see instruction) and when respondent from reserve list was from the same sex and 5 years age group (20-24, 25-29....65-69). The aim of the technology has been aplicated is to keep structure of the sample by age, by sex and by region.
Status OVER SAMPLE shows that over sampling was done during the field.
Table 2
The data show that status RESERVE has significantly lower response rate. This confirms our point of view that increasing of cluster size has not lead to the better results and has not increased response rate. 850 addresses from the original sample were not used in the survey.
B2. RESPONSE RATE BY TYPE AND SIZE OF LOCALITY
A county specific variable "TYPE OF CLUSTER" was included in the appendix "A". Code 2. LARGE CITY refers to the six largest cities in Bulgaria - Plovdiv, Stara Zagora, Burgas, Varna, Russe and Pleven in which lives 15.1% from the total population.
Data in the table 3 shows that response rates significantly differs in accordance of type and the size of locality.
TABLE 3
TABLE 4
Data from Census '92 refers to the total population. The structure of sample was calculated for population over 18 years and on the bases of results from previous census was done in 1986. For 6 years population in Bulgaria decreases with about 500,000 persons due to the high mortality (1.9% per year) and immigration - about 430,000 persons have left the country. The decline in the population is most significant in the district centers (due to the immigration) and in the villages (due to mortality). This processes had affected to distribution of the population and for this reason to the structure of sample.
TABLE 5.--Percentage distribution of responses by size of locality.
Data from Census are available only for the rural population (32.4%) and for cities above 100 000 persons. In the cities lives 31.7% from the population (30.7% from completed interviews).
C. AGE AND ETHNICITY
According to data from Census '92 population between 20 and 69 years is 5,541,785 persons. The percentage of the age groups is shown in table 6. The average age of the sample is 44.526 years with SD 14.0. The average age of our respondents is 44.224 years with SD 13.766. Final sample became slightly younger due to the over sampling of age groups under 40 years.
TABLE 6
Data from Census for age are available in ten years groups only.
Data from Census '92 about ethnic origin and mother's language have caused a large discussion in Bulgaria about validity of this data.
Preliminary data about ethnic origin of total population are shown in table 7.
TABLE 7
Bulgarian statisticians applied the best methodology during the census and Bulgarian-Muslims (about 2.5-3.0% from the population) were counted as Turks. A lot of Gypsies named themselves Turks. For this reason Bulgarian Parliament has decided to suspend results from Census for some municipalities with predominantly Bulgarian-Muslims population (mainly in the district Blagoevgrad).
So called Turks were over represented in the SSEE survey with 3.5% (7 clusters with predominantly Turkish population). This occurred for the next two reasons. First when I ordered the sample in the ESGRAON colleagues told me that they usually had decreased Turkish clusters (votes) in samples from 7.5% to 6.0%. The reason for this is that about 120,000 Turks left the country between Nov. 1991 and dec. 1992. I asked them not to do this because the sample is very large and it would be good to have a small reserve. Second, when the survey started the interviewers from Kurdjali succeeded to complete between 5-7 interviews in five Turkish clusters. The reason was that the people from the lists had left the country but local authorities (elected from MRF) kept them in the voting lists. I gave to the interviewers 5 clusters from second sample I had and they could find almost of the respondents from the second sample. I did not believe that they could do more than 10 interviews in the cluster but they did and Turks were over represented.
In my opinion Turks in the SSEE survey have to be weighted.
D. RESULTS FROM ELECTIONS IN 91 AND SSEE SURVEY
In table 8 we compare results from the last parliamentary elections held on October 1991 with the distribution of votes for main political parties in the SSEE survey. Votes in the clusters were weighted by number of completed interviews in each cluster.
TABLE 8
E. SAMPLING ERROR
In theory, sampling error for a 6000 persons sample must not exceed 1.5% (for simple random sampling). In cluster samples this error increases with 15-30% i.e. between 1.7-2.0%. Cluster samples of 2000-2500 persons that we usually use in Bulgaria give errors between 1.3-2.0%. But larger samples sometimes may give the greater deviations.
Insofar as I can judge at this point, the maximal deviation which has been registered in the SSEE survey up to now is 1.8%, and in most cases varies between 0.5 and 1.0%.
F. DISPOSITIONS AND REASONS FOR UNCOMPLETED INTER
ATTEMPT 1: Distribution of failures to complete interview on first attempt
ATTEMPT 2: Distribution of failures to complete interview on second attempt.
ATTEMPT 3: Distribution of failures to complete interview on third attempt.
Data Processing Report (by Tsvetozar Tomov)
1. Organization
The information was processed in the period Oct. 5 - Nov. 15 1993 in two different files corresponding to the first and the second part of the general population questionnaire. Total number of key operators: 17. Way of processing: using a codebook, through a text editor. A free interval is left between each two variables.
2. Control
2.1. File process on SPSS line by line. In cases when an impossible code was found in the course of the data entry, the key operator corrected the information.
2.2. After joining the files: a second check-up for impossible codes. The quationnaires where an impossible code was discovered were recorded in a separate file.
2.3. Joining of the files and making SYS-files for both the first and the second part.
3. Difficulties
3.1. The main difficulty was the double classification of the place of residence codes. As far as there is no easy way to make a check-up for illegal codes of these variables, there might be some undetected mistakes.
3.2. The coding of the professions was an unusal task for the key operators: they could not exercise logical control of the occupation variables.
Czech Republic (by Petr Matêjû and Milan Tucek)
A. Sampling
In accordance with the agreement between the Czech Statistical Office and the Institute of Sociology of the Czech Academy of Sciences, the general population survey (SSEE) was conducted as a parallel survey to the Microcensus 1992 -- the statistical income survey -- conducted by the Czech Statistical Office. Microcensus 1992 was carried out in March and April 1993, on a random sample of 0.5% of households (housing units). The SSEE was conducted on an approximately 33% subsample from the addresses issued for Microcensus 1992, except in the case of Prague, where all addresses issued for Microcensus 1992 were used for also the SSEE survey.
a) The sampling procedure for the Microcensus survey
The sampling unit is the "housing unit", which is principally identical with the definition of "housing household" (living in one apartment). Housing units were selected using a two-stage random sampling procedure:
1. All census districts (census tracks) of the Czech Republic were classified into 8 categories (strata) according to the population size of the locality.
2. In each category (stratum) a random sample of "housing units" (addresses) was drawn from the Census 1991 (data collection in March, 1992). All addresses were checked for changes that occurred between the Census and the time of the Microcensus and SSEE survey -- that is between March 1992 and March 1993. Only valid addresses were issued for the Microcensus 1992 survey. The total number of addresses issued for the Microcensus was 18,598 (approximately 0.5% percent of all households).
b) The sampling procedure for the SSEE survey
Addresses: The addresses for the SSEE survey were randomly selected from the addresses issued for the Microcensus 1992. The probability step 0.363 was translated into mechanical sampling step 3-3-3-2 (repeatedly three times each third address and then the second one). In order to reach about 1.500 completed interviews in Prague (over-sample for the analysis of residential mobility), all addresses issued for the Microcensus were also used for SSEE. The total number of addresses issued for SSEE (including the over-sample in Prague) was 8,316. Each address issued for the SSEE survey was assigned a statistical code for county and a sequential number (pagina) representing the household within that county. These numbers were provided by Regional Statistical Offices and interviewers were not authorized to make any changes in these codes.
Individuals: Individuals in "housing units" were selected for the SSEE interviews randomly by the following procedure: After the completion the questionnaire for Microcensus 1992, interviewers were asked to fill in the table on the special cover sheet attached to the questionnaire ("Evidençní list"), so that all persons living in the household born between 1924 and 1973 (age 20-69) were ranked by age (starting with the oldest person). Then a random number -- defined as the third digit of the sequential number of the household -- was used to choose between two Kish- tables (even number - Table A, odd number - Table B). Then the fourth digit of the sequential number of the household was used to select the appropriate row of the Kish-table. The appropriate column of the table was defined by the total number of persons in the household. The figure in the cell of the Kish-table was then used to decide the sequential number of the person on the list of household's members to be interviewed. No substitutes were allowed if the selected person was not available or refused to be interviewed. The use of this sampling procedure was assessed ex-post by analyzing the data from the cover sheet for questionnaires that were released for coding and data entry. Comparison of the theoretically correct selection with the actual selection made by the interviewer shows that 95.4% of selections were correct. Table 1 shows the correctness of actual selections by the sequential number of the selected person.
2 5.2 94.4 0.4 0.0 0.0 100.0 1908
3 2.1 4.2 93.7 0.0 0.0 100.0 191
4 1.9 1.9 0.0 96.2 0.0 100.0 53
5 0.0 0.0 0.0 0.0 0.0 100.0 9
Selected interviewers of the Czech Statistical Office were trained for both the Microcensus and for the SSEE survey. Only these specially trained interviewers obtained the addresses chosen both for the Microcensus and SSEE. Training of interviewers for the SSEE was conducted by the Institute of Sociology. Regarding the sequence of the interviews for the two parallel surveys, the general instruction was that the interview for the Microcensus was always the first one, followed by the sampling procedure determining the respondent for the SSEE. After the selection of the respondent was finished, there were two options for proceeding: either to conduct the interview for the SSEE (if selected person was present and willing to answer the questionnaire), or to make an appointment for another day.
The field work started on February 20, 1993, and ended April 4, 1993. The analysis of a special cover sheet shows some elementary parameters of the fieldwork. From the questionnaires returned from the field, 60% were completed on the first attempt, that is just immediately after the interviewer finished the questionnaire for Microcensus 1992. Of the remaining 40% of cases, in 18% no-one was at home, and in 22% the person selected for our survey was not available at the moment (not at home, didn't have time, etc.). Another 25% of the interviews were completed on the second attempt, and the remaining 15% were completed on the third attempt.
Regarding the length of the interview, the data from the cover sheet show that the actual time of the interview was far above the initial estimates. The average length was 1 hour and 44 minutes, with a standard deviation of 37 minutes. Table 2 displays the distribution of the length of interviews in major categories.
The overall response rate was 65%, in Prague it was only 48%, in other regions about 73%. The overall response rate for the Microcensus was 84%.
The unusually low response rate in Prague was primarily due to generally negative popular attitudes towards any collection of personal data after the long-lasting political debate concerning the so-called "lustration law" (the law passed in 1992 according to which every candidate for selected positions in the state apparatus or for positions at higher level of management in state firms or institutions should prove that he or she was not a member of the former secret police before 1989). Another reason people were reluctant to provide any personal information was the long public discussion that took place before the field work started about the law prepared for the parliament concerning personal responsibility for the "crimes of communism" and violations of human rights between 1948 and 1989. Therefore, the climate for sociological surveys was generally extremely unfavorable. On the top of this, a political campaign against our survey started in the Czech press just a few days after the field work began. For these reasons the total number of completed interviews was much lower than originally expected, especially in Prague where the effect of these political discussions and press campaigns was especially strong (we received only 1198 instead of expected 1,500 completed interviews).
To reach the required 1,500 completed interviews it was decided to conduct a second wave of data collection in Prague, which started on April 24, 1992 and ended May 14, 1993. For the second wave 500 randomly selected addresses were issued by the Czech Statistical Office from the reserve of addresses prepared for the Microcensus 1992 (the sample for the Microcensus was originally 1% of households, and only few weeks before the data collection it was reduced for financial reasons to 0.5%). Interviewers with the best results in the first wave were hired for the second wave. The second wave of data collection in Prague ended with 239 completed interviews. After this total number of completed interviews for Prague then reached 1,449 cases.
Table 2: Categories of interview length
Table 3: Response rate by regions ("kraj") - before the second wave in Prague
Final sample sizes:
Whole sample (including Prague over-sample) 5,621
cases
Prague (regular sample: 563; oversample: 884) 1,449
cases
Remainder of Czech Republic (regular sample) 4,194
cases
Principal reasons for non-response:
Complete information on the principal reasons for non-response and refusal will be available only after the release of data from Microcensus 1992 (in December 1993). Preliminary figures drawn from the cover sheets and from preliminary figures released by the Statistical Office will be presented here.
a) Non-response for Microcensus 1992:
- refusal 65%
- no person present in the apartment during the survey 30%
- other reasons (illness, inability to answer the questionnaire,etc.) 5%
Note: Due to the sampling design, in these cases the interview for SSEE was not possible.
b) Microcensus 1992 completed, non-response only for the SSEE:
- no person within the age limit (20-69) in the household 55%
- refusal 30%
- selected person not present for a longer time 10%
- illness, mental problems, etc. 3%
- questionnaire not completed 2%
C. Geographical and occupational coding
Geographical units
The coding of geographical units followed the rules endorsed by the international research team. The list of residential units classified in Czech statistics as "towns" was slightly modified by adding urban residential units that belonged to towns in the past and by adding residential units that belong to urban or city agglomerations. Each town was then assigned a county code (four digits) and a sequential number for the town within the county (three digits). Other residential units (villages) were coded by the county code and the common code 666.
Special codes:
7777 residential units abroad
981 Hungary
982 Poland
983 former Soviet Union
984 other country in "Eastern Europe" (former communist)
985 Germany (former West Germany)
986 Austria
987 other country in "Western Europe"
989 USA, Canada, Australia
990 other
8888 information not available
888 no fixed place, frequently on move 999 information is missing, refused
The geographical location of the respondent's current address was coded by the number of the so-called "urban block" (the smallest geographical and statistical unit). This coding will make it possible to create most of the analytical classifications of residential units (cadastral units, city districts, etc.). It will also allow us to match our data with aggregate census statistical data for these units. These statistical data may be obtained from the Prague Statistical Office on request and at reasonable costs.
Occupational coding
Occupational coding was based on the Czechoslovak version of the International Standard Classification of Occupations 1988 (ISCO-88), developed by the Czechoslovak Statistical Office and the Institute of Sociology, Czechoslovak Academy of Sciences (Matêjû, Tucek, Voknerova) in 1991-1992. The four digit Czechoslovak version of ISCO-88 was then modified by adding special codes suggested by Treiman and Szelényi for the analysis of the development of occupational structure before the collapse of the communist regime and mobility during the post-communist transformation. Then a recoding procedure was developed for converting the Czechoslovak national coding system into both the standard international coding system and the international system modified in UCLA for our project. This coding scheme was sent to UCLA along with suggestions for modifications in the recoding scheme for EGP classification. Frequencies of EGP classification for respondent's current job are displayed in Table 4.
D. Comparison of basic distributions from SSEE with Census data
In spite of relatively low response rate (before 1990 the average response rate was about 85% including Prague), some distributions show a reasonable similarity between the Census data and our sample (Table 5). Deviations in percentages of the youngest and the oldest cohorts show that weighting may be necessary due to the effect of the sampling through households (higher probability for individuals from single person households - mostly among the elderly).
Among the economically active population, we found 10.7% of the self-employed in the probability sample (i.e. without the Prague over-sample), 64% to be employees in the state or public sector, 6.5% members of cooperatives, 11% employees in enterprises recently privatized or currently being privatized, and 16% to be individuals employed in private companies.
Regarding membership in the Communist party, 16% of respondents both in the proportional and full samples admitted they were at one time or another members of the Communist party. This number corresponds with available statistical data.
Table 4: EGP class - respondent's current job
Note: Only for proportional random sample (without Prague over-sample).
Table 5: Elementary comparison of distributions with the Census
*) Data from the Census were calculated for the population over the age of 15. Data from SSEE are valid for the age 20-69.
About 3% of respondents declared they were fired from their jobs for political reasons at least once in their lives, and 6% of the respondents admitted that one of their parents was fired from his/her job for similar reasons. However, 11% of the respondents reported that their educational career was negatively influenced by the political situation (special Czech and Slovak questions). Regarding property "restitution," 11% of respondents answered that their parents, spouses or they themselves received any property confiscated from the family after 1948.
E. Post-field work data processing
The Czech data file was prepared for checking and cleaning in July 1993. There were three stages of checking and cleaning. First, those cases in which crucial questions or tables were not answered (life history, education, etc.) or the questionnaire showed apparent inconsistencies were removed from the data file. Second, obvious errors in coding or in data entry were checked against the questionnaires and corrected. Third, the process of logical controls and cross-checking for consistency between various answers started in September 1993.
This stage of cleaning has revealed that the questionnaire was unusually difficult even for well trained and experienced interviewers, mostly due to the high complexity given by the length of the questionnaire. Serious problems in the consistency between answers were found especially due to many logically interconnected rosters (education + activity history + self- employment + part-time economic activity).
The first version of the data-file was mailed to UCLA on September 27, 1993. After sending the first version, some additional controls and cross-checking were performed which revealed several other inconsistencies that had to be corrected. The second version of the Czech-data file was mailed to UCLA on October 28, 1993 along with the file containing national specific variables (for questions that appeared only in the Czech and Slovak questionnaires).
The Czech team also developed SPSS commands creating international classifications from nation-specific coding schema (ISCO-88, education, forms of study, branch of industry, etc.). These SPSS commands were also sent to UCLA.
E. Funding and acknowledgements
All costs of the pretest, the preparation and production of instruments, data collection, coding and data entry for the general population survey SSEE were covered by the Institute of Sociology of the Academy of Sciences of the Czech Republic which has received support for this project from various sources. Major support for the project was received from the U.S. National Science Foundation (NSF) and from the Dutch National Science Foundation (NWO) -- both these grants were given to the principal investigators of the international research project (Donald J. Treiman and Ivan Szelényi - both from UCLA). Two other research grants were given to the Czech research team from the Grant Agency of the Czech Academy of Sciences (to Petr Mateju and Pavel Machonin). We also received a significant contribution from the Czech Ministry of Labor and Social Affairs.
Prague, October 27, 1993
Sampling
The sampling procedure of the general population survey.
We ordered eight 1250 member representative sample-- it represented the population over 18 years by age and sex-- from the State Population Register Office. The settlements were chosen by Median, supported by a special randomizing program. Each sample contained 120 sampling points, including districts of Budapest, towns and villages. We made the fieldwork with four sample--it was started in 31 March 1993 and finished in 9 June 1993--in four waves. For every sample we used a supplementary sample.
Data collection and response rates
Selected and specially trained interviewers worked in the four waves. In the first and second waves we send out letters to each person in the sample. In the last two waves the interviewers went out without letters. We decided the change on the feed back of interviewers who were working in Budapest and towns. Their experiences were wrong with the letter, saying" "the people have time to decide, that they don't want to take part in the survey."
We used 7113 addresses: main addresses 68%; supplementary addresses 32%.
TABLE 1:
TABLE 2
TABLE 3
Men 47
47
Women 53
53
University 11
9
Secondary 24
21
Primary
65
70
18-30 21
20
31-40 19 21
41-50
19 18
51-60 16
16
Over 60 25
25
Post fieldwork
The questionnaires were coded by the interviewers. The data file was prepared for checking and cleaning in July 1993. The first step was to find impossible codes, after it came the logical control (skipping) and to check the data realities. The most important the problematic variables were seen whether the incorrect or unbelievable data came from the questionnaire or it was a mistake of data entry. We went back to every question where supposed mistakes were found.
The special Hungarian codes were sent to UCLA with the data file, and we finished the correction of occupation codes in March 1994.
A. Sampling
For the SSEE survey a multistage random sample was drawn from the National Register of Polish Citizens (Polish abbr.: PESEL). The sampling frame covers the whole population of Poland. PESEL data are continuously updated in case of changes in the place of residence, emigration abroad, births, deaths, etc. In light of previous empirical experience, the register provides to secure more accurate data for sampling purposes than other registers at stake (i.e. those based on Census data).
In the first stage the population was divided into 17 strata: 9 urban and 8 rural. The urban strata consist of cities grouped according to a number of inhabitants. The division was based on the categories presented in Table 3. The rural population was divided into 8 regions by collapsing voivodeships - administrative units of the highest level (the area of Poland is divided into 49 voivodeships). The population distribution across regions is given in Table 4.
The second stage of the sampling started with selecting a number of sampling units within each strata. The first and second urban strata (cities above 500,000) consists of 5 cities (Warsaw, ód , Kraków, Pozna , and Wroc aw). We established each city as a separate stratum and in each city drew a simple random sample of individuals. The size of each sample was proportional to the number of inhabitants. The same strategy was applied to the third urban strata covering 15 cities (200,000-499,999 inhabitants). The 6 remaining urban strata were processed in a different way. We decided to select a constant number of respondents in each city (20 cases). Then, the number of cities was determined, proportionally to the size of stratum. A sampling with replacement was used to select cities. The probabilities were thus kept proportional to the city size.
The latter strategy was also applied for the 8 rural strata. As primary units we utilized counties. A county is an administrative unit covering a number of villages or a city. The area of Poland is divided into 3009 counties of which 856 are city-counties and 2153 are rural counties. The mean number of inhabitants in a rural county is about 6,800.
For all rural strata a cluster of the same size was established (20 cases) and then the number of counties (clusters) was selected proportionally to the stratum size (population of each county). After the counties were selected the simple random sample of individuals was drawn independently in each county.
What we mean by a simple random sampling of individuals is in fact a systematic selection of individuals from the sampling frame of PESEL. The individual records are ordered by an 11-digit identification number starting from the 6-digit date of birth (year/month/day) and remaining 5 digits assigned randomly. As a result there is no correlation between individual's id number and his/her place of residence. Thus, the systematic sample may be treated as an equivalent to simple random sample. In fact, it is even better, because it is stratified by age, taking into account the age distribution in a county or a city.
Resulting from this procedure a sample of 4896 individuals was selected. Personal data with addresses were transformed into special selection forms used by interviewers in the field. The interviewers were not informed about the year of birth of selected person. These records were kept confidential to secure a possibility to check interviewers' work.
B. Data collection
The field work started on June 1, 1994, and ended about July 15, 1994. Before going to the field all interviewers were intensively trained by the staff of the Center for Social Survey Research in the Institute of Philosophy and Sociology, Polish Academy of Sciences. Totally 290 interviewers were employed to carry on SSEE interviews.
C. Response rates
The total response rate for the survey was 0.719. This figure slightly varies across demographic and territorial categories (Tables 1, 2, 3, and 4).
Table 1.--Distribution of sex in SSEE data compared with the population of Poland aged 20-69
Table 2.--Distribution of age in SSEE data compared with the population of Poland aged 20-69
Table 3.--Cities by size in SSEE data compared with the population of Poland aged 20-69 (the urban population constitutes 63 per cent of the Polish population aged 20-69; the rural population is shown in Table 4).
Table 4.--Villages by region in SSEE data compared with the population of Poland aged 20-69.
D. OCCUPATIONAL AND GEOGRAPHICAL CODING
Occupations were coded according to the 1988 ISCO classification. We used a Polish version of this classification developed and tested on various empirical data collected by the Institute for Social Studies, University of Warsaw. The Polish version is fully equivalent to ISCO 88 in terms of the most detailed units, but we enriched it with many occupational descriptions peculiar to the Polish labor market. The Polish version is implemented with a computer-assisted program. This enhanced the accuracy of coding the SSEE data.
To maintain comparability with the SSEE data from other countries, we converted the Polish ISCO codes into the expanded international ISCO codes shown in Appendix C in the data set that was sent to UCLA.
For coding geographical places we used the Residential Units Classification developed by the Polish Statistical Office. The classification schema is based on the administrative units of the lower level -- counties [gminy]. The number of counties in Poland is relatively large (3,009), which permits very precise distinctions between different geographical places.
The territorial codes used in the Residential Units Classification are 5-digits numbers. The first two digits identify voivodeships. The third digit provides information about the urban and administrative status of a county (main city in a voivodeship, other city, a district in a large city, or a village). For SSEE purposes we had to modify the Residential Units Classification by adding some special codes for other countries. The final version of classification used is presented in Appendix F.
E. WEIGHTING
Due to the fact that response rates vary across different combinations of categories of socio-demographic variables (i.e. interactions occur) we decided to develop a single post-stratification weight compensating basic deviations from population characteristics. Three such characteristics were considered: sex, age category (as described in Table 2), and place of residence (size of a city for cities and region for villages; see Tables 3 and 4).
The way we constructed the weight was straightforward. We produced, first, a three dimensional table for population data by cross-classifying sex, age category, and place of residence. Then we produced the same table for SSEE results. By computing a ratio of numbers in corresponding cells of both tables we obtained weights for individuals in each cell. The final stage was normalization of weight values to the number of completed interviews (3520). It was done for convenience in making statistical tests.
Data were collected for a special sample of the Warsaw population in September-November 1994, using a questionnaire identical to the questionnaire used for the national survey with the exception of a few questions added to the end of the questionnaire for methodological studies being undertaken by the Polish research group (these are shown in Section IV). As per our agreement, the Warsaw sample was of a size such that, combined with the Warsaw cases from the national sample, it would yield a minimum of 1,500 cases for Warsaw. In fact, 1360 cases were collected in the special survey and 143 cases from the national survey were from Warsaw, for a total of 1,503 cases. The special sample is distinguished from the Warsaw portion of the main sample by the variable WSAMPLE, which has code 401 for the Warsaw cases from the national sample and code 402 for the cases from the special sample.
Geographic information
The codes for REGION, DISTRICT, and CITY are consistent with the codes used for the Polish national sample. All cases for Warsaw have REGION code 1 ("xxxx"), DISTRICT code 1 ("xxxx"), and CITY codes 01001 - 01013, corresponding to the seven districts of Warsaw. These are:
01001 - Mokotow
01003 - Ochota
01005 - Praga Poludnie
01007 - Praga Polnoc
01009 - Srodmiescie
01011 - Wola
01013 - Zoliborz
In addition, a special variable, TER_CODE, is used to identify small geographical areas within Warsaw. TER_CODE is an alphanumeric variable that identifies the map page and sector of the map, Warszawa Plan Miasta: Skala ok. 1:18,000 (1992). For example, code 15A3 correspondes to map number 15, sector A-3. Detailed geography was coded to map sectors since, in the judgement of the Polish team, no adequate geographic information was available from the Polish Statistical Office. Users of the small area data are advised to consider GIS techniques to convert the map sectors into more highly aggregated units, particularly as statistical data for such units become available.
Weights
As with the national sample, the Polish research team weighted the data to take account of differential non- response. This was done by comparing a crosstabulation of age by sex by district of residence for the survey data with a corresponding tabulation obtained from the Polish Central Statistical Office and weighting by the reciprocal of the ratio of the two proportions. The resulting weights were normed to 100,000, with a minimum of 74,137 and a maximum of 273,388. We have renormed these weights to 1, so that the weighted and unweighted sample sizes are identical. This of course results in weights ranging from .741 to 2.734. These weights are included in the data as WWEIGHT. Analysts wishing to study Warsaw should always weight their data by WWEIGHT.
The general weight variable, WEIGHT, is given a constant value of 0 for all cases in the Warsaw sample, as a reminder to users that it should not be combined with the Polish national sample nor with the samples from the other countries; mechanical application of the WEIGHT variable will automatically exclude the Warsaw sample.
Why we did not combine the two samples
We explored the possibility of combining the Polish national sample and the Warsaw sample in order to increase the number of cases available for cross-national comparisons, as we had done for the Prague oversample, but decided not to do this for Poland. The main problem is that Warsaw includes only about 4.7 per cent of the population of Poland, and so combining the two samples would entail downweighting the Warsaw data by a factor of nearly seven and upweighting the non-Warsaw portion of the sample by a factor of nearly 1.4. There are two difficulties with this approach: it creates the impression that we have a larger sample than we in fact do have (a national sample of 4880 [= the sum of the two samples minus the duplicated cases], instead of a true national sample of 3520 cases), and--in combination with the fact that the weights the Polish team applied to correct for differential non-response vary by a factor of about two--it produces a set of weights with a very large range: the largest weight could in principle be as much as 35 times as great as the smallest weight, an extremely undesirable outcome.
Russia (by Ludmila Khakhulina and VCIOM staff)
I. Sample.Design
At designing the sample of this study, a model of three-stage stratified random sampling was used. The initial size of sample was 5,000 respondents. The information on the number of population and structure of settlement was based upon the data of 1989 census.
1. Formation of strata
At the first stage of the sample designing, all settlements (S) in the territory of Russia were divided by strata. At stratifying settlements, 4 strategic factors were used. The first factor - the geographic location of the settlement. In the official statistics, the territory of Russia is traditionally divided into 11 major economic regions:
1. Northern
2. North-Western
3. Central
4. Volga-Vyatka
5. Central Black Earth
6. Volga
7. North Caucasian
8. Urals
9. West Siberia
10. East Siberia
11. Far Eastern
The appertaining of settlements to one of these regions served as the geographic strategic factor. This made possible to take into account, at describing the settlements, not merely the geographic but the economic factor as well.
The second factor - the ethnic one. For this factor, the appertaining of a settlement to an autonomous unit was used (autonomous republics, oblasts, districts).
The third factor - the urbanization. All settlements in each region were divided into urban and rural. Urban settlements were, in their turn, divided into types by number of population.
For urban settlements the fourth factor was used - the administrative status (i.e., whether the city is the administrative center of an oblast, territory, autonomy).
In dividing the territory of Russia, strata were defined as cross-sections of settlements` groups defined above-mentioned factors. The lists of all strata used at sample designing in this study are presented in Table 1. In the table, the proportion of population in each stratum related to the entire number of Russia's population from 20 to 80 years of age is indicated.
The total sample was divided among strata proportionally to the number of population from 20 to 80 years of age, residing in each stratum. At forming strata, it was observed that the number of respondents in each stratum would not go down below the threshold of n_str_min=10, adopted for one sampling point. In this case, the general totality of population being M=101 mln people and the entire size of sample N=5,000, the minimum size of a stratum is M/N*10=202,000. If any of the strata selected in a region appeared to be less than 202,000 in size or (which was the same) its proportion was less than 0.2%, it was united with a stratum closest to it by stratum-building factors. In Table 1, the united strata are marked with "{ }", and in the first of them the overall number of questionnaires inside them is indicated on the right side.
2. Determination of the Number of Sampling Points in a Stratum
After the stratification was finished, selection of sampling points was made by probability proportional to size (PPS) method in each stratum.
In urban strata, specific settlements were selected from corresponding lists of settlements. The probability of selection was directly proportional to the number of population of each settlement related to the overall number of population in a stratum. Noteworthy, the selection of specific settlements in urban strata was carried out by the Sampling Departament at VCIOM in Moscow and the selection of settlements in rural strata was carried out by VCIOM regional offices, basing on the instruction on selection of rural settlements, worked out by the Sampling Departament.
Before selecting specific settlements in strata, for each stratum the total number of sampling points was determined. This number was defined as
Entier(n_str/n_str_max)+1
where "n_str" is the total number of questionnaires in a stratum, "n_str_max" - the maximum possible number of respondents in one sampling point of a given stratum. The quantities (n_str_max) for strata of different types were selected proceeding from the cost of interview conducting in different settlements.
Noteworthy, at calculating the number of sampling points for each stratum, results were approximated; besides, for each sampling point of the same stratum an equal number of questionnaires was adopted. Therefore, the initial overall number of questionnaires was corrected. Table 1 presents corresponding values of (n_str_max), the calculated number of sampling points in strata, and the corrected number of questionnaires with regard for the maximum quantity of questionnaires and the calculated number of questionnaires in a stratum. E.g., in the stratum "Cities - administrative centers with population numbered less than 1 million", the maximum number of questionnaires in a sampling point makes up 65 for the Northern region, and the calculated number in a strata is 40, whereas the corrected number of questionnaires also makes up 40.
3. Selection of Settlements (Sampling Points)
After the number of sampling points in each stratum had been determined, concrete urban settlements were selected using random number generator. At first, the selection of specific settlements was carried out in strata of central cities. For each of such strata, list of all central cities within a given stratum was compiled. Probabilities of selection were proportional to the number of population of these cities. At settlements selecting in strata of peripheral towns, not all the towns from the number of them were included in the list of settlements for selection but only those which are situated in the same oblasts as the previously selected central cities. Such a selection of peripheral towns is connected to the necessity to optimize overhead expenses (transport, accommodation, travelling allowance, etc.), which in the end would reduce the overall expenses for conducting the survey. In social structure surveys, for the relative similarity of population of various oblasts, such an approach at selecting peripheral towns does not lead to noticeable biases of variables. In Table 2, lists of settlements in all strata of central cities are quoted. For each stratum, the number of selected sampling points is indicated, and for each settlement - the probability of selection and value of the random number generator. The symbol "*" marks the settlements which were selected as a result. Lists of settlements for all strata containing peripheral towns are not quoted in the tables for their unwieldy size.
For practical impossibility to conduct surveys in some settlements included in the sample design (difficult access to settlements, lack of trained staff of interviewers), regional offices replaced some sampling points with regard for the administrative status and population number of a settlement to be replaced. These replacements were agreed upon with the Sampling Department (Moscow).
Table 3 shows all the executed replacements of sampling points as well as the connected to it redistribution of questionnaires among the sampling points inside one stratum.
Rural settlements were selected by regional offices by the following rules:
1) Settlements should be situated at least 50 km from big cities and are evenly divided among the oblasts, territories, and autonomies included in the sample specified for the regional office.
2) If more than one settlement is surveyed in a region, they are evenly divided between central farmsteads and peripheral rural settlements.
In all, 68 settlements were selected, including 30 central farmsteads and 38 peripheral rural settlements. 7 settlements were replaced, with some deviations from the procedure. 66 questionnaires were obtained in them.
Totally, 143 urban and rural settlements were included in survey.
4. Selection of Households in Sampling Points
The selection of respondents in sampling points was done by VCIOM regional offices in 2 stages: at first, households were randomly selected from previously prepared lists, then, one respondent was selected inside each household.
The selection of households in cities and villages was done by the standard systematic method. Noteworthy, the households were selected with a 30% reserve providing cases of ineligibility of some respondents.
According to lists of selected households, tasks were set to interviewers. One interviewer was commissioned to conduct at maximum 10-12 interviews.
5. Selection of Respondent inside a Household
The respondent's selection inside a household was effectuated by the following procedure:
1) If nobody was at home (nobody opened the door), the address was visited at least 3 times and only after 2 callbacks it was replaced with another address.
2) If the door was opened, the respondent was to enquire with anyone or all of those present about the following:
a) Nfam - the number of families living in this house/apartment;
Nard - the total number of people living at this address.
If several families were living in the house/apartment, at first the interviewer selected the family. To select the family (e.g., in a communal flat), a list of all families was composed and arbitrary numerated, then the number of family the respondent was to be selected within was equal to ( divide remainder of Nadr/Nfam ) + 1.
b) list of all members of the selected family living at this address;
c) in the obtained list, those people were to be marked who answered the conditions of the study, i.e., all persons older than 20 and younger than 80 years of age;
d) for each of those answering the conditions of the study, the date of birth was ascertained, i.e., the day and month.
3) From the list obtained in b), the person was selected whose birthday was the closest to the date of survey.
4) After the respondent had been selected, two variants appeared:
a) the respondent was at home at the moment, and in this case the interview was conducted;
b) the respondent was not at home. Then in case of complete inaccessibility, such as long absence (business trip, leave, in hospital, etc.) or active military service, the address was replaced; in the case of a short absence, the address was called 3 times. It was prohibited to interview another member of the family. If the respondent was never caught at home, the address was replaced.
5) All the lists by which the respondent had been selected were enclosed with the questionnaire.
II. Aposterior Control of the Sampling
One of the ways of aposterior control of the sample's representativeness is comparison of the survey's results with statistical data. Table 4 shows comparison of selected data obtained in the present study with data of the State Committee for Statistics, based on 1989 census. The comparison was done by the main socio-demographic indicators: sex, age, education, marital status, and occupation.
At comparing the data quoted in the table, considerable biases in the education level are seen (see Table 4).
Analysis of Biases of the Education Level:
1. The educational structure of Russia's population depends on many factors the principal of which are: sex, age, type of location (city or village), and the number of population in the settlement. The state statistics based on 1989 census contains average statistical data on the level of education in settlements differing for type and number of population regardless of sex and age. Urban settlements are divided into 6 sorts depending on the number of population in them, i.e.,
1) less than 20,000
2) 20,000-50,000
3) 50,000-100,000
4) 100,000-250,000
5) 250,000-500,000
6) more than 500,000.
Moscow and St.-Petersburg are especially selected since the level of education in these cities is considerably higher even in comparison with cities with population over 500,000. Basing on statistical data, means for each of the mentioned sorts of urban settlements were obtained as well as for all rural settlements. In table 5, data on the level of education in each cross-section and for Russia as a whole are quoted, obtained on the basis of statistical data, as well as the proportion of each cross-section in the population of Russia in an age of 20 to 80 years. For comparison, data on the proportions of each cross-section in the realized sample are presented, and on this basis, data on the level of education for Russia as a whole are corrected.
The analysis of data quoted in Table 5 shows that the comparatively great shortage in the sample of "low educated" settlements with population of less than 20,000 and the exaggeration of the proportion of "high educated" settlements result in an overall increase of the proportion of high education at 0.4 percent.
2. People with higher education spend more time at home and are easier to be contacted by an interviewer, i.e., they are a more easily accessible group as compared with "uneducated" people. This error could be partly lessened due to increase of the number of callbacks from 2 to 7-8. But this would result in a considerable increase of expenses for fieldwork conducting.
In the present study 5,025 questionnaires were actually completed. The main reasons of unrealized interviews were: interrupted interviews, refusals, and a respondents' absence at home after two callbacks. By an expert examination, it can be evaluated that among respondents within the three above-mentioned groups, people with higher education are found approximately 3 times less frequently than in the surveyed totality. As a result, the level of higher education has increased in the sampled totality at 3-3.5%.
Thus, for the mentioned reasons the level of higher education in the sampled totality has increased up to 23.1%. To decrease this bias, we recommend the statistical treatment of survey information to be carried out after data weighting. We are ready to present the array of necessary weights of questionnaires.
III. Fieldwork
1. Preparative Work
The survey "Social Stratification in Eastern Europe" was conducted among urban and rural population in 18 regions of Russia by 17 VCIOM regional offices. The following preparative work was done.
1. Chiefs of regional offices participating in the survey had been invited to Moscow. After familiarizing with the questionnaire, they were consulted on the technology of surveying, methods of respondents' selection, instructing of interviewers, post-fieldwork processing of questionnaires.
2. For the conducting of this survey, the most skilled interviewers were selected from the permanently acting interviewer staff. Each interviewer was familiarized with the questionnaire. and the supplemented cards, and with the "Interviewer's Instruction", heard the tape-cassette recorded "Instruction on Interview Conducting", as well as was instructed by the survey supervisor on methods of interviewing, selection of a respondent inside a household with two callbacks, and familiarized with the rules of filling of the form "Registration of Attempts to Conduct an Interview".
3. For this survey, the following documents regulating the fieldwork were worked out and sent to VCIOM regional offices:
1. Instruction to the fieldwork organizer
2. Instruction to an interviewer
3. Instruction on composing the list of addresses in a sampling point
4. Instruction to an interviewer on selecting a respondent inside a household
5. Tape-cassette of the "Instruction on completing the questionnaire"
6. Sampling design for each reginal office
7. Registration forms of attempts to conduct an interview
8. Cards of control of interviewers' work
2. Time of Fieldwork
The survey was conducted by face-to-face interview at respondents' home.
The fieldwork stage of the study was conducted from April 24 through July 6, 1993. As a result of post-fieldwork examination, 37 questionnaires were rejected in the Volga-Vyatka region, and 35 questionnaires in the Urals region, which had to be additionally administered. At the North Caucasian office, one interviewer was became victim of robbery and 65 questionnaires were lost together with the luggage, which also had to be additionally supplied. For these reasons, the fieldwork stage was finally completed by August 12, 1993.
3. Analysis of Registration Forms of Attempts to Conduct an Interview
The analysis of the registration forms of attempts to conduct an interview shows that after one visit 77.3% of questionnaires were completed, after 2 visits - 26.5%, and after 3 visits, i.e., two callbacks, - 6.2%
The reasons for which an interview failed to be conducted during the first attempt were as follows:
A respondent was not at home 37%
A respondent did not have time 36%
A respondent was absent for a long time (business travel, leave) 5%
Other (illness, inability to answer the questions) 22%
4. Length of Interview
The minimum time of questionnaire completing was 25 minutes (at the Zhiguli office) and the maximum time was 180 minutes (at the Zhiguli and Udmurt offices). On the average, the questionnaire was completed during 1 hour 20 minutes. It can be noted that the duration of interview was not influenced by the status of the residence place. The majority of respondents and interviewers mentioned the great duration of interview because of the large size of questionnaire. The minimum time spent on an interview was explained by the fact that some sections of the questionnaire were not answered by respondents on the condition.
5. The Quantity of Conducted Interviews
Completed questionnaires - 5,025 (*).
The overall response rate was 76%.
Nonresponses:
Respondent not found (after 2 callbacks): 274 (18% of all nonresponses)
Interrupted interviews: 474 (31%).
Refusals: 612 (41%).
Other reasons (illness, inability to answer, etc.): 144 (10%).
(*) 23 questionnaires rejected after checking.
Table 5 contains data on the planned sample (Plan) and its actual fulfillment (Fact) in all regional offices participating in the survey.
IV. Post-fieldwork Processing of Questionnaires
1. At visual examination, questionnaires were rejected, in which the cover sheet was completed with deviations from the accepted standard; answers to questions about the year and place of birth, sex of respondent (to other key questions), to entire sets of questions, if not conditioned, were simultaneously missing. After the first stage, 19 questionnaires were rejected. At the second stage of the visual examination, control of keeping the filter questions was done. Automatic checking of codes of locations and residential places was also conducted.
2. Coding
The coding was done both in regional offices and in the Moscow headquarters by the specialized group of coders. All the country- specific code-lists are enclosed with the report.
Table 4 Posterior Analysis
Table 5
Table 6.
Distribution of questionnaires by sampling points (Villages: CF - central
farmstead; O - outlying village; DC - district center;
A - in autonomous region/district)
Note that the Slovak questionnaire is , with one exception, identical to the Czech questionnaire. The exception is that the trade unions in Slovakia are different from those in the Czech Republic. In addition, of course, the Slovak questionnaire was administered in Slovak while the Czech questionnaire was administered in Czech.
A. SAMPLING
In accordance with the agreement between the Institute of Sociology of the Slovak Academy of Sciences, the Institute of Sociology of the Czech Academy of Sciences and the Slovak Radio (Methodic and Research Department), the general population survey (SSEE) was performed by Slovak Radio. The SSEE was carried out in May and September - October 1993 on a random sample of 5 000 respondents in the age from 20 to 70 years what is 0.15% of Slovak Republic population in this age.
The sampling procedure for the SSEE survey
Respondents were selected using a two-stage random sampling procedure:
1. All communities of Slovak Republic were clasified into 7 categories
(strata) according to the population size of the locality
- up to 2 000 inhabitants
- 2 000 - 5 000 inhabitants
- 5 000 - 10 000 inhabitants
- 10 000 - 20 000 inhabitants
- 20 000 - 50 000 inhabitants
- 50 000 - 100 000 inhabitants
- over 100 000 inhabitants
All towns of categories 20 000 and more inhabitants were included into
the sample. From the other categories (less then 20 000 inhabitants) the
localities were selected proportionally.
2. Addresses in communities were randomly selected from lists of voters in 1992 elections according to following instruction. Every town or selected local goverment office was asked for list of voters. Total number of voters was divided by 20 and result was rounded down off whole number. The result was used as sampling step number. The addresses were selected using random number defined by each interviewer and sampling step. The age of respondent was recorded from list of voters, too. If respondent under selected address was older than 70 years, instruction was to choose the next name, but for calculation of following address to use again primary order.
The selected addresses and names were sent to Slovak Radio. The Slovak Radio randomly selected 6216 addresses of respondents (24% oversanple). The respondents were informed about a visit of interviewer by announcement letter, what is a standard procedure in Slovak Radio field work.
B. Data collection and response rates
The interviewers of Slovak Radio network were particullary trained for the SSEE by the Institutes of Sociology (both from Prague and Bratislava). The interviewers were instructed to meet selected respondents for interview or to make an agreement about appropriate date of interview. Reliability of interviewers was checked up on the return of the announcement letters previuously sent to respondents.
The field work started on May 10, 1993, and ended May 31, 1993. >From 6 216 addresses were realized 4201 interviews. The overall response rate was 67.6%, in Bratislava it was only 41.0%.
The unusually low response rate, particularly in Bratislava and other bigger towns was due to two factors:
- high number of people moved in period after 1992 elections that resulted
in unpresence in the home/appartment during the survey,
- negative attitudes towards giving personal information, particularly
in regard of the extent of SSEE questinaire.
For that reason it was decided to perform a second wave of data collection in Bratislava and other bigger towns to reach the required 5 000 completed interviews and the sample representative for residential structure of Slovak Republic. In second wave were selected additional 1120 addresses in localities with the worst response rate. The second wave of field work started on September 25, 1993, and ended October 7, 1993. There were 760 completed interwievs in the second wave of data collection, or 67.9% of overall response rate. The second wave confirmated the overall tendency of response rate from the first one.
The average length of the interview was 1 hour and 41 minutes, with a standard deviation of 33 minutes.
TABLES
Table 1: Categories of interview length
Table 2: Response rate by regions (1st and 2nd wave)
Table 3: Non-response reasons (percent)
C. Geographical and occupational coding
The coding of geographical units and occupational coding was in the Slovak sample based on the common Czechoslovak version of geographical and occupational codes. This part of post-field work was checked in the Institute of Sociology of the Czech Academy of Sciences in Prague by M.Tucek.
The coding of geographical units followed the rules endorsed by the international research team. The list of residential units classified in Slovak statistics as "towns" was slightly modiefied by adding urban residential units that belong to urban or city agglomerations. Each town was then assigned a county code (four digits) and a sequential number for the town within the county (three digits). Other residential units (villages) were coded by the county code and the common code 666.
Special codes:
7777 residential units abroad
981 Hungary
982 Poland
983 former Soviet Union
984 other country in "Eastern Europe" (former
communist)
985 Germany (former West Germany)
986 Austria
987 other country in "Western Europe"
989 USA, Canada, Australia
990 other
8888 information not available
888 no fixed place, frequently on move
999 information is missing, refused
Occupational coding
Occupational coding was based on the Czechoslovak version of the International Standard Classification of Occupations 1988 (ISCO-88), developed by the Czechoslovak Statistical Office and the Institute of Sociology, Czechoslovak Academy of Sciences (Mateju, Tucek, Voknerova) in 1991-1992. The four digit Czechoslovak version of ISCO-88 was then modified by adding special codes suggested by Treiman and Szelenyi for the analysis of the development of occupational structure before the collapse of the communist regime and mobility during the postcommunist transformation. Then a recording procedure was developed in Institute of Sociology of Czech Academy of Sciences in Prague for converting the Czechoslovak national coding system into both the standard international coding system and the international system modified in UCLA for our project.
D. Comparison of basic distributions from SSEE with Census data
The main distributions confirm similarity between the Census data and SSEE sample in Slovak Republic (Table 4). Differences in percentages of age cohorts shaw that the most undersampled is the youngest cohort and generally respondents in the age under 35 years. This tendency we explain by higher (longterm and shorterm) mobility in the youngest one and generally in the younger cohorts. Underrepresenting of younger cohorts in the sample is caused by the random sampling procedure (random selection from one year old list of addresses). Weighting may be necessary in case of evaluation of mobility in the period after the year 1989 and particularly in two last years.
Table 4: Elementary comparison of distribution with the Census (percent)
An interesting confirmation of SSEE sample validity in Slovak Republik gives the comparison of declaration of the party respondent voted for in 1992 and real results of elections.
Table 5: Declared votes and result of elections in 1992
Both from declared nationality of respondents and declaration of the party respondent voted for in 1992 is evident that the Hungarian minority was undersampled. Similar suspection may be done concerning the Romany minority.
E. Postfield work data processing
There were three stages of data checking and cleaning. First, those cases in which crucial questions or tables were not answered (life history, education, etc.) or the questionaire showed apparent inconsistencies were removed from the data file.
Second, obvious errors in coding or in data entry were checked against the questionaires and corrected. Third, the process of logical controls and cross-checking for consistency between various answers started in November 1993.
The stage of cleaning has revealed that the questionaire was unusually difficult even for well treined and experienced interviewers, mostly due to the high sophistication given by the length and and roster schemes used in the questionaire. Serious problems in the consistency betwenn answers were found particularly due to interconected rosters (education - activity history - self-employment - part-time economic activity). The Slovak data file was mailed to UCLA along with the file containing national specific variables (for questions that appeared only in the Czech and Slovak questionaires) by Institute of Sociology of the Academy of Sciences of Czech Republic in Prague.
E. Funding and acknowledgements
All costs of the preparation and production of instruments, data collection, coding and data entry for the Slovak Republic general population survey SSEE were covered by the Institute of Sociology of Slovak Academy of Sciences and by the Institute of Sociology of the Academy of Sciences of Czech Republic which has received major support for the project from the U. S. National Science Foundation - this grant was given to the principal investigators of the international research project (Donald J. Treiman and Ivan Szelenyi - both from UCLA). Second research grant was given to the Slovak research team from the Grant Agency of the Slovak Academy of Sciences (to Jan Buncak for the project The Political System and the Social Structure of Slovakia in the Period of Transformation toward Democracy).
Bratislava, February 24, 1994.
For problems with this web page, please e-mail libbie@ucla.edu.