| Type: | Package | 
| Title: | Computation and Decomposition of the Mutual Information Index | 
| Version: | 2.0.4 | 
| Date: | 2025-07-28 | 
| Maintainer: | Cristian Angulo-Gonzalez <cristian_world@hotmail.cl> | 
| Description: | The Mutual Information Index (M) introduced to social science literature by Theil and Finizza (1971) <doi:10.1080/0022250X.1971.9989795> is a multigroup segregation measure that is highly decomposable and that according to Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008> and Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> satisfies the Strong Unit Decomposability and Strong Group Decomposability properties. This package allows computing and decomposing the total index value into its "between" and "within" terms. These last terms can also be decomposed into their contributions, either by group or unit characteristics. The factors that produce each "within" term can also be displayed at the user's request. The results can be computed considering a variable or sets of variables that define separate clusters. | 
| License: | GPL-3 | 
| Imports: | Rcpp (≥ 1.0.13), data.table, parallel, runner, stats, | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| URL: | https://github.com/RafaelFuentealbaC/mutualinf | 
| BugReports: | https://github.com/RafaelFuentealbaC/mutualinf/issues | 
| Depends: | R (≥ 2.10) | 
| Collate: | 'Data_Source.R' 'get_internal_data.R' 'get_contribution.R' 'M_value.R' 'M.R' 'get_general_contribution.R' 'get_proportion.R' 'M_within.R' 'M_within_inv.R' 'RcppExports.R' 'globals.R' 'mutual.R' 'prepare_data.R' | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-08-07 13:45:30 UTC; crist | 
| Author: | Cristian Angulo-Gonzalez [aut, cre], Rafael Fuentealba-Chaura [aut], Ricardo Mora [aut], Julio Rojas-Mora [aut], FONDECYT/ANID Project 11170583 [fnd], MCIN/AEI/10.13039/501100011033 (Project no. PID2019-108576RB-I00) [fnd], UCT VIP Project FEQUIP2019-INRN-03 [fnd] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-08-24 06:20:02 UTC | 
An R package to compute and decompose the Mutual Information Index (M).
Description
The Mutual Information Index (M) introduced to the social sciences by Theil and Finizza (1971). The M index is a multigroup segregation measure that is highly decomposable, satisfiying both the Strong Unit Decomposability (SUD) and the Strong Group Decomposability (SGD) properties (Frankel and Volij, 2011; Mora and Ruiz-Castillo, 2011).
The package allows for:
- The computation of the M index, either overall or over subsamples defined by the user. 
- The decomposition of the M index into a "between" and a "within" term. 
- The identification of the "exclusive contributions" of segregation sources defined either by group or unit characteristics. 
- The computation of all the elements that conform the "within" term in the decomposition. 
- Fast computation employing more than one CPU core in Mac, Linux, Unix, and BSD systems. This option uses the data.table and parallel libraries (which Windows does not permit to run with more than one CPU core). 
Author(s)
 Rafael Fuentealba-Chaura rafael.fuentealba97@gmail.com
Cristian Angulo-Gonzalez cristian_world@hotmail.cl
Ricardo Mora ricmora@eco.uc3m.es
Julio Rojas-Mora julio.rojas@uct.cl
References
Frankel, D. and Volij, O. (2011). Measuring school segregation. Journal of Economic Theory, 146(1):1-38. doi:10.1016/j.jet.2010.10.008.
Guinea-Martin, D., Mora, R., & Ruiz-Castillo, J. (2018). The evolution of gender segregation over the life course. American Sociological Review, 83(5), 983-1019. doi:10.1177/0003122418794503.
Mora, R. and Guinea-Martin, D. (2021). Computing decomposable multigroup indexes of segregation. UC3M Working papers, Economics 31803, Universidad Carlos III de Madrid. Departamento de Economía.
Mora, R. and Ruiz-Castillo, J. (2011). Entropy-based segregation indices. Sociological Methodology, 41(1):159-194. doi:10.1111/j.1467-9531.2011.01237.x.
Theil, H. and Finizza, A. J. (1971). A note on the measurement of racial integration of schools by means of informational concepts. The Journal of Mathematical Sociology, 1(2):187-193. doi:10.1080/0022250X.1971.9989795.
Segregation data in southern Chile
Description
The data set included in this package was build using two data sets. The first one is the student enrollment reported by the Ministry of Education (MINEDUC, https://datosabiertos.mineduc.cl/) for students of primary education (first eight years of formal education) who attended establishments officially recognized by the State. The second one is the Quality and Context of Education Questionnaire for Parents and Guardians, and the Student Questionnaire, both applied by the Education Quality Agency (https://www.agenciaeducacion.cl/) to all students in grades 4 and 8 of primary education. Both sources are limited to the period 2016-2018. Contains information related to students and educational system characteristics in southern Chile (Biobio, La Araucania and Los Rios regions).
Usage
DF_Seg_Chile
Format
A data.frame with 191495 observations and 11 variables:
- year
- Student enrollment year. From 2016 to 2018. 
- school
- School ID (RBD, Rol de Base de Datos). 
- district
- Administrative district where the school is located. 
- csep
- Preferential Scholar Subsidy Category (from the SpanishCategoría de Sub-vención Escolar Preferencial). Students belong to either the non-subsidized, the partially-subsidized, or the subsidized group acording to the Act 20.248 of Preferencial Scholar Subsidy (SEP). 
- ethnicity
- Self-reported Mapuche ethnicity. Students belong to Mapuche ethnicity or not. 
- rural
- School with multiage classrooms. The school is located in a urban zone or not. 
- region
- Administrative region where the school is located. Schools can belong either Biobio region, La Araucania region or Los Rios region. 
- sch_type
- Whether the school is public, charter, or private. 
- gender
- Student gender code. Students can either be female or male. 
- grade
- Student grade. Students can either belong to the 4th (4) or 8th (8) grade of basic school. 
- nobs
- Number of students in a cell or combination of variables. 
Source
Ministry of Education (MINEDUC): https://datosabiertos.mineduc.cl/
Education Quality Agency: https://www.agenciaeducacion.cl/
Segregation data in southern Chile
Description
The data set included in this package was build using two data sets. The first one is the student enrollment reported by the Ministry of Education (MINEDUC, https://datosabiertos.mineduc.cl/) for students of primary education (first eight years of formal education) who attended establishments officially recognized by the State. The second one is the Quality and Context of Education Questionnaire for Parents and Guardians, and the Student Questionnaire, both applied by the Education Quality Agency (https://www.agenciaeducacion.cl/) to all students in grades 4 and 8 of primary education. Both sources are limited to the period 2016-2018. Contains information related to students and educational system characteristics in southern Chile (Biobio, La Araucania and Los Rios regions).
Usage
DT_Seg_Chile
Format
A data.table with 55960 observations and 11 variables:
- year
- Student enrollment year. From 2016 to 2018. 
- school
- School ID (RBD, Rol de Base de Datos). 
- district
- Administrative district where the school is located. 
- csep
- Preferential Scholar Subsidy Category (from the SpanishCategoría de Sub-vención Escolar Preferencial). Students belong to either the non-subsidized, the partially-subsidized, or the subsidized group acording to the Act 20.248 of Preferencial Scholar Subsidy (SEP). 
- ethnicity
- Self-reported Mapuche ethnicity. Students belong to Mapuche ethnicity or not. 
- rural
- School with multiage classrooms. The school is located in a urban zone or not. 
- region
- Administrative region where the school is located. Schools can belong either Biobio region, La Araucania region or Los Rios region. 
- sch_type
- Whether the school is public, charter, or private. 
- gender
- Student gender code. Students can either be female or male. 
- grade
- Student grade. Students can either belong to the 4th (4) or 8th (8) grade of basic school. 
- nobs
- Number of students in a cell or combination of variables. 
Source
Ministry of Education (MINEDUC): https://datosabiertos.mineduc.cl/
Education Quality Agency: https://www.agenciaeducacion.cl/
Segregation data in southern Chile
Description
The data set included in this package was build using two data sets. The first one is the student enrollment reported by the Ministry of Education (MINEDUC, https://datosabiertos.mineduc.cl/) for students of primary education (first eight years of formal education) who attended establishments officially recognized by the State. The second one is the Quality and Context of Education Questionnaire for Parents and Guardians, and the Student Questionnaire, both applied by the Education Quality Agency (https://www.agenciaeducacion.cl/) to all students in grades 4 and 8 of primary education. Both sources are limited to 2018. Contains information related to students and educational system characteristics in southern Chile (Biobio, La Araucania and Los Rios regions).
Usage
DT_test
Format
A data.table with 6703 observations and 5 variables, only for testing pourposes:
- school
- School ID (RBD, Rol de Base de Datos). 
- csep
- Preferential Scholar Subsidy Category (from the SpanishCategoría de Sub-vención Escolar Preferencial). Students belong to either the non-subsidized, the partially-subsidized, or the subsidized group acording to the Act 20.248 of Preferencial Scholar Subsidy (SEP). 
- ethnicity
- Self-reported Mapuche ethnicity. Students belong to Mapuche ethnicity or not. 
- region
- Administrative region where the school is located. Schools can belong either Biobio region, La Araucania region or Los Rios region. 
- fw
- Number of students in a cell or combination of variables. 
Source
Ministry of Education (MINEDUC): https://datosabiertos.mineduc.cl/
Education Quality Agency: https://www.agenciaeducacion.cl/
Computes and decomposes the Mutual Information index
Description
Computes and decomposes the Mutual Information index into "between" and "within" terms. The "within" terms can also be decomposed into "exclusive contributions" of segregation sources defined either by group or unit characteristics. The mathematical components required to compute each "within" term can also be displayed at the user's request. The results can be computed over subsamples defined by the user.
Usage
mutual(
  data,
  group,
  unit,
  within = NULL,
  by = NULL,
  contribution.from = NULL,
  components = FALSE,
  cores = NULL
)
Arguments
| data | An object from the "data.table" and "mutual.data" classes. | 
| group | A categorical variable name or vector of categorical variables names contained in  | 
| unit | A categorical variable name or vector of categorical variables names contained in  | 
| within | A categorical variable name or vector of categorical variables names contained in  | 
| by | A categorical variable name or vector of categorical variables names contained in  | 
| contribution.from | A variable of character type that can be 'group_vars' or 'unit_vars', or also, a categorical
variable name or vector of categorical variables names contained in the  | 
| components | A boolean value. If TRUE and the  | 
| cores | A positive integer. Defines the amount of CPU cores to use in parallelization tasks. If  | 
Details
Mixing group variables with unit variables in contribution.from will produce an error.
Value
A data.table if the components option is FALSE; a list if the components option is TRUE,
the within option is not NULL and the by option is NULL; or a list of lists if the components
option is TRUE, and both within and by options are not NULL.
References
Frankel, D. and Volij, O. (2011). Measuring school segregation. Journal of Economic Theory, 146(1):1-38. doi:10.1016/j.jet.2010.10.008.
Guinea-Martin, D., Mora, R., & Ruiz-Castillo, J. (2018). The evolution of gender segregation over the life course. American Sociological Review, 83(5), 983-1019. doi:10.1177/0003122418794503.
Mora, R. and Guinea-Martin, D. (2021). Computing decomposable multigroup indexes of segregation. UC3M Working papers, Economics 31803. Universidad Carlos III de Madrid. Departamento de Economía.
Mora, R. and Ruiz-Castillo, J. (2011). Entropy-based segregation indices. Sociological Methodology, 41(1):159-194. doi:10.1111/j.1467-9531.2011.01237.x.
Theil, H. and Finizza, A. J. (1971). A note on the measurement of racial integration of schools by means of informational concepts. The Journal of Mathematical Sociology, 1(2):187-193. doi:10.1080/0022250X.1971.9989795.
Examples
# To compute the overall measure of school segregation by socioeconomic and ethnic status.
mutual(data = DT_test, group = c("csep", "ethnicity"), unit = "school")
# Computation of the exclusive effect of specific segregation sources on the overall measure, e.g.,
# socioeconomic and ethnic contributions, and the contribution that cannot be attributed to any of
# them (the "interaction" term).
mutual(data = DT_test, group = c("csep", "ethnicity"), unit = "school", by = "region",
contribution.from = "group_vars")
# For more information on the package, refer to the manual and the README file.
Prepares the data to be used by the mutual function
Description
Takes a tabular object (micro-data or a frequency table) and returns a
data.table ready for mutual.
The output
* stores every analytical variable as a factor;
* holds the weight variable under the unified name fw (numeric);
* aggregates identical combinations (summing their weights) and drops rows
where fw == 0.
Usage
prepare_data(data, vars, fw = NULL, col.order = NULL)
Arguments
| data | A tabular object:  | 
| vars | A vector of column names or indices, or the literal
 | 
| fw | (optional) Name or index of the frequency-weight column.
Must reference exactly one column. If  | 
| col.order | (optional) Column(s) used to sort the final table;
must be included in  | 
Value
A data.table with classes "mutual.data",
"data.frame" and "data.table".
The analytical variables are stored in the attribute "vars";
the key is cleared.
Examples
md <- prepare_data(
  DF_Seg_Chile,
  vars = c("csep", "ethnicity", "school", "district"),
  fw   = "nobs"
)
md <- prepare_data(DF_Seg_Chile, vars = "all_vars")
class(md)