dataquieR 2.5.1
- News
- fixed a bug found by latest R-developer-version caused by
parentheses in the wrong position in encapsulated function calls. this
did not cause any harm, but was nevertheless a bug.
- properly deprecated the argument threshold_valuefromacc_varcomp()
- loessand margins plot slightly improved
 
- Amendment to 2.5.0 news
- deprecated (and accidentally removed already) the argument
threshold_valuefromacc_varcomp()
 
dataquieR 2.5.0
- New features
- improved support for categorical variables, including:
- time trends w/ and w/o grouping variable
- observer/device effects
- distribution plots
 
- dq_report2()can store results on the disk instead of
the RAM with the new argument- storr_factory. This can be
useful in reducing issues of memory consumption, but we suggest to use
fast- SSDs or- NVMes
- all indicator functions now create result objects with nice print
functions (visible in the Data Viewer instead of the Console window).
However, this also implies, that warnings, errors and messages are
returned as part of the result object and are printed with that object.
If you want to restore the original behavior, use the option
options(dataquieR.dontwrapresults = TRUE). Withoptions(dataquieR.testdebug = TRUE), you can switch off
this behavior.
- dataquieRcan provision your function arguments from
the metadata. In order to enable- lapplyand- Vectorize(SIMPLIFY = FALSE)with indicator functions, the
first argument is now always- resp_varsfor item level
functions.- dataquieRtries to guess if a function that
features both- resp_varsand- study_dataas its
first arguments was called w/o- resp_varsbut only with- study_dataas its first unnamed argument. If that is the
case, it sets- resp_varsto the default for- resp_vars(typically all variables). With- options(dataquieR.testdebug = TRUE), you can switch off
this behavior, if you need.
- an improved version of dq_report_by, in which it is
possible to specify:
- how to split the data in parts (strata and/or variable groups)
- strata and/or variable groups to include/exclude
- how to filter observational units
- a selection of variables to analyze (resp_vars)
- a variable/s containing ID information for merging data frames
(id_vars)
 
- a new function int_encoding_errorschecking invalid
characters present in the text with respect to the expected character
encoding / code page, e.g., a code place in thelatin1table is used but the encoding isutf8resulting in damaged
text output
- a new dashboard in the General menu, in
Item-level data quality dashboard, usable to customize data
summaries
- new selection buttons are now present in the report to select
visible columns in the displayed tables (it also applied to the export
buttons)
- support for a sheet CODE_LIST_TABLEin the metadata,
where it is possible to state both value label tables and missing list
tables all in one table.
- support for a sheet item_computation_levelin the
metadata, where it is possible to state variables to be computed from
the provided study data.
 
- Breaking changes
- moved example data from the package to our website. If you are
already using prep_get_data_frame("ship")orprep_get_data_frame("study_data")in your code to access
example data, no change is needed. If you are still accessing example
data usingsystem.file()(e.g. usingload(system.file("extdata", "study_data.RData", package = "dataquieR"))),
you need to switch toprep_get_data_frame(), i.e.:load(system.file("extdata", "study_data.RData", package = "dataquieR"))would becomestudy_data <- prep_get_data_frame("study_data")
- changes in the output names:
- renamed SummaryDatainResultData(functions:acc_shape_or_scale,acc_margins,com_segment_missingness)
- removed column GRADINGfromSummaryDataoutputs.SummaryTableoutputs still feature the column,
since these are meant to be a machine readable interface
- con_contradictions_redcapused to return a result named- SummaryTable, while the documentation spoke about- SummaryData. Alas, it should have been- VariableGroupTablein both cases. If you relied on- SummaryTablein the results of- con_contradictions_redcap, you need to change your code to
use now the correct output name- VariableGroupTable. Also,
the table has been slightly modified.
- VariableGroupDataas returned by- con_contradictions_redcapis a version optimized for human
readers.
- in VariableGroupTableas returned bycon_contradictions_redcapthe columncategoryhas been renamed toCONTRADICTION_TYPE
- in con_contradictions_redcap, ifsummarize_categoriesis selected the result will now be in
a sub-list namedOther
- in prep_add_computed_variables, the columnresp_varsis now namedVAR_NAMES, to be more
in line with other data frames.
 
 
- Reporting
- improved button to export Excel, pdf, and print (colors
supported)
- improved rendering time introducing thumbnails as first visible
result in the report. Clicking on the image, the thumbnail is replaced
by plotly’s interactive figures
- implementation of [.dataquieR_resultset2and[[.dataquieR_resultand related functions have changed
slightly. You can now for a report
(r <- dq_report2(...)) call, e.g.,
 r[, "com_item_missingness", "ReportSummaryTable"]to get a
balloon plot orr[, "com_item_missingness", "SummaryData"]to get a table, for all variables that were assessed withcom_item_missingness()in the reportr
- if you print a list of dataquieR_resultobjects, these
will be combined, but due to restrictions inR, this only
works, if you callprint()explicitly on this list, not
with “auto-printing” (see https://stackoverflow.com/a/53983005), for
example:
 a <- lapply(c("v00001", "v00004", "v00005", "v00006"), acc_loess, meta_data_v2 = "meta_data_v2", study_data = "study_data")print(a)works, but typingaalone does not.
You have to callprint()or to putlapply()in
brackets:(lapply())
 
- (Indicator) Functions related
- acc_distributions()was split in- acc_distributions()and- acc_distributions_ecdf()(- prep_acc_distributions_with_ecdf()creates the original
plot)
- there is a new function acc_cat_distributions()
- all functions now feature:
- a meta_data_v2argument
- new argument item_level, as synonyms formeta_data, new argumentsegment_level, as
synonyms formeta_data_segment, new argumentdataframe_level, as synonyms formeta_data_dataframe, new argumentcross-item_level, as synonyms formeta_data_cross_item, new argumentitem_computation_level, as synonyms formeta_data_item_computation
 
- if you call functions without label_col, thelabel_colwill now default toLABEL, except
you set the optionoptions(dataquieR.testdebug = TRUE)oroptions(dataquieR.dontwrapresults = TRUE)
- the argument resp_varsinprep_scalelevel_from_data_and_metadata()was never working
correctly and not used neither, so it has been deprecated. It is already
not functional and it never was
- the function des_summaryis still present, but you can
now get results for continuous or categorical variables only, usingdes_summary_continuousanddes_summary_categoricalrespectively
- con_contradictions_redcapplot colors vary depending on- CONTRADICTION_TYPES
- acc_loess()uses- lowessinstead of- loess(both from the- statspackage)
 
- General
- test coverage increased, again
- fixed bug in prep_check_for_dataquieR_updates(), so,
maybe, you need to manually install the latest beta release usingdevtools::install_gitlab("libreumg/dataquieR", auth_token = NULL)
- figure sizes have been overworked in the default report
- options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "subset_u")is now the default assuming a one-fits-all-metadata-file (see- ? dataquieR.ELEMENT_MISSMATCH_CHECKTYPE)
- fewer custom implementations of stuff available from
rlangorwithr, most prominently a fasterprep_prepare_dataframes()andrlangcompatible
condition (error) handling.
- small changes in the behavior of the dataquieR_resultclass, which is now applied also to results outside a pipeline.
- many small fixes to figures
- small fixes to menu titles
- bug fixes
 
dataquieR 2.1.0
- renamed metadata column SEGMENT_ID_TABLEtoSEGMENT_ID_REF_TABLEin segment level metadata
- scale level metadata support and heuristics
- significantly improved data quality summaries
- consolidated some of the indicator functions (limits, work in
progress)
- many minor optimizing changes
- figure sizing (work in progress), also resize handles
- improved report files structure
- improved dq_report_byfiles structure
- fixes, e.g., in rules, fixes in label shortening, computation speed
and cache pre-filling-control
- improved Excel export from the HTMLreports
- missing codes: column CODE_INTERPRETchanged to be in
line with theAAPORdefinitions, so the following
translation:PP -> P; P -> I; OH -> UO
- fixed tests
- updated concept excerpt
- excluded nominal and ordinal variables from marginal means
analysis
- improved data type handling
dataquieR 2.0.1
Reporting
- New functions prep_save_reportandprep_load_report
- Update and simplification of summary overview, empty columns/rows
omitted from the matrices. Also, better classification of errors
- Many small updates in the usability of the report
- Fixes in HTML/JSoutput forFirefox
- Bug fixes of report outputs that were not looking as expected (in
contradiction checks and limit violations)
- Fixed mixed distribution plots called several times
- Enable auto-resizing of plot.ly-plots
- Fixed rendering problems for the new, automatically size-reduced
plots causing the report rendering to fail if having
gginnardsinstalled; removed dependency fromgginnards.
- Do not show superfluous axis labels (e.g., variables, if variable
names are on an axis because these usually overlap without improving the
output)
- Prevent a warning of robustbaseaboutdoScale
- Less noisy display of conditions (e.g., warnings, errors, messages)
with the results in dq_report2reports
- summarytoolsare included in- dq_report2reports, if installed.
- New report rendering code polished, parallel execution of
HTMLgeneration prepared
- New parallel mode for dq_report2using a queue improves
speed
- Full support for VARIABLE_ROLESindq_report2and suppressing helper variable outputs indq_report_by
- Do not show conditions (e.g., warnings, messages, errors) in reports
if they address the call of the function (e.g., “using default for
argument…”) by dq_report2and not directly by the user
- No unit-missingness in dq_report2because it is not so
useful in its current implementation
- More robust dq_report_byfor large reports (can write
and optionally render results to disk rather than returning them)
- Bug fix in dq_report_bycausingDATA_PROCESSnot to work
- Fixed some errors and TODO’s indq_report_byand add dependent variables on the fly but
withVARIABLE_ROLEsuppress:
- If no role is given, add “primary” by default for single reports as
well as for dq_report_by
- Support meta_data_v2 in dq_report_by
- FIXED: referred variables did not correctly resolve co_vars and
labels instead of variable names
 
- Several bug fixes:
- Addressed most parts of
https://gitlab.com/libreumg/dataquier/-/issues/242
 
- Addressed https://gitlab.com/libreumg/dataquier/-/issues/244 and
https://gitlab.com/libreumg/dataquier/-/issues/212
- Default for result-slot-filter was not set
(filter_result_slotsindq_report2)
- Sometimes, long labels in the first columns of a
JS-table prevented controlling the table
 
- Fixed missed check for missing cross-item level metadata and earlier
check for valid item-level metadata
- Control crude segment missingness output, so that we see it only if
there is more than one segment on the item-level after the removal of
VARIABLE_ROLESfiltered items
- Outliers should work with empty metadata in
UNIVARIATE_OUTLIER_CHECKTYPEandMULTIVARIATE_OUTLIER_CHECKTYPE
- Fixed successive dates to ignore empty dates
- New functions in REDCapsyntax:strictly_successive_datesandsuccessive_dates
- Bug fixes for REDCaprules andNAhandling
andDATA_PROCESS.
- Checked, that code is in line with
https://gitlab.com/libreumg/dataquier/-/issues/243#note_1419465360
- Default for contradictions with the new syntax is now that hard
limits and missing codes are not removed. The argument
use_value_labelsis not supported anymore. You can specify
the behavior on the rules level in the new cross-item-level metadata
columnDATA_PREPARATION
- Compute end digit preferences only if explicitly requested by a new
item-level metadata column END_DIGIT_CHECKindq_report2, (DATA_ENTRY_TYPEis still
supported and auto-converted). If missing,END_DIGIT_CHECKdefaults toFALSE
- Bug fix: Contradiction rules failed in specific cases if
NAwere in the data
- Bug fix: cross-item_level normalization crashed, causing rules to
fail, e.g., JUMP_LISTcould be added to the item-level
metadata if missing, but causing this type of failing rules
- Bug fixes for Windowsand uncommon variable names
General
- Workbooks can now be loaded from the internet (using
prep_load_workbook_like_fileandmeta_data_v2 =formal indq_report2)
supportinghttpandhttpsURLs (e.g.,ExcelorOpenOfficeworkbooks)
- Documentation updates
dataquieR 2.0.0
- dq_report2replaces- dq_report. Please use- dq_report2from now on.
- Full new reporting engine (needs htmtoolsand supportsplotly)
- Better report layout and improved functionality
- Support for reading and referring to data in files/URLs
- Support for the integrity dimension in data quality report
- Included distribution and multivariate outlier (provide cross-item
level metadata for the latter) plots in data quality report
- Metadata scheme update (segment, data.frame, and
cross-item levels). No required action by user, previous version still
supported
- REDCaprules for contradictions (cross-item level
metadata), previous contradictions function still supported
- Support metadata describing segment data and study data tables
(segment and data.frame-level metadata)
- New item-level metadata version (backwards compatible)
- Support for computation of qualified missingness based on labels
from the AAPORconcept
- acc_univariate_outlierand- acc_multivariate_outliernow allow selecting the methods
used to flag- outliers
- Included distributional checks in the accuracy dimension for
location and proportion
- Rotation of plots can now be controlled
- Improved many figures
- Better control over warnings
- If whoamiis installed, reports now show a more
suitable user name
- Many minor improvements
- Updated citations
dataquieR 1.0.13
- fixed a left-over ~from theggplot2updates causingacc_marginsto fail for categorical
variables
dataquieR 1.0.12
- Addressed a problem with the markdown template underlying the
dq_reportreports with wrong brackets
- Addressed deprecations from ggplot2 3.4.0
- Added ORCIDsfor two authors
- Updated the CITATIONfile
- Updated the README.mdfile adding the funding
sources.
dataquieR 1.0.11
- Addressed a problem with some test platforms
- Added funding agencies in the manual
dataquieR 1.0.10
- Fixed NEWS.mdfile
- Fixed documentation
dataquieR 1.0.9
- Fixed bug in sigmagapand made missing guessing more
robust.
- Fixed checks on missing code detection failing for
logical.
- Fixed a damaged check for numeric threshold values in
acc_margins.
- Fixed wrongly named GRADINGcolumns.
- Improved parallel execution by automatic detection of cores.
- Tidy html dependency
dataquieR 1.0.8
- Removed formal arguments from rbind.ReportSummaryTablesince these are not needed anyways and the inherited documentation for
those argumentsrbindfrombasecontains an
invalid URL triggering aNOTE.
dataquieR 1.0.7
- Fixed bugs in example metadata.
- Figures now have size hints as attributes.
- Added simple type conversion check indicator function of dimension
integrity, int_datatype_matrix.
- Corrected some error classifications
- prep_study2metacan now also convert factors to- dataquieRcompatible- meta_data/- study_data
- Slightly improved documentation.
- Bug fix in com_item_missingnessfor textual response
variables.
- Added new output slot with heat-map like tables. Implemented some
generics for those.
dataquieR 1.0.6
- Robustness: Ensure DT JSis always loaded when adq_reportreport is rendered
- Bug fix: More robust handling of DECIMALS variable attribute, if
this is delivered as a character.
- Bug Fix: com_segment_missingnesswithstrata_vars/group_varsdid not work
- Bug Fix: If label_colwas set to something else thanLABEL,strata_varsdid not work forcom_unit_missingness
- More precise documentation.
- Fixed a bug in a utility function for the univariate outliers
indicator function, which caused many data points flagged as outliers by
the sigma- gap criterion.
- Made outlier function aware of too many non-outlier points causing
too complex graphics (e.g. pdf rendering crashes the PDF reader).
- Fixes and small improvements in dq_report.
- Switched from cowplottopatchworkinacc_marginsyielding figures that can be easier
manipulated. Please note, that this change could break existing output
manipulations, since the structure of the margins plots has changed
internally. However, output manipulations were hardly possible for
margins plots before, so it is unlikely, that there are pipelines
affected.
- More control about the output of the acc_loessfunction.
- More robust prep_create_metahandling length-0
arguments by ignoring these variable attributes at all.
- Added a classification system for warnings and error messages to
distinguish errors based on mismatching variables for a function from
other error messages.
- JOSS
- Some tidy up and more tests.
dataquieR 1.0.5
- Fixed two bugs in con_inadmissible_categorical(oneresp_varonly and value-limits all the same for allresp_vars)
- Changed LICENSE to BSD-2
- Slightly updated documentation
- Updated README-File
dataquieR 1.0.4
- Fixed CITATION, a broken reference in Rd and a problem with the
vignette on pandoc-less systems
- Improved an inaccurate argument description for multivariate
outliers
- Fixed a problem with error messages, if a dataquieRfunction was called by a generated functionfthat lives in
an environment directly inheriting from the empty environment, e.g.environment(f) <- new.env(parent = emptyenv()).
- Marked some examples as dontrun, because they sometimes
causedNOTEs onrhub.
dataquieR 1.0.3
- Addressed all comments by the CRAN reviewers, thank you.
dataquieR 1.0.2
- Bug Fix: If an empty data frame was delivered in the
SummaryTableentry of a result within adq_reportoutput, thesummaryand alsoprintgeneric did not work on the report.
dataquieR 1.0.1
- Skipping some of the slower tests on CRAN now. On my local system, a
full
devtools::check(cran = TRUE, env_vars = c(NOT_CRAN = "false"))takes 2:22 minutes now.
dataquieR 1.0.0
- Initial CRAN release candidate