run_schedule() simplified. Counts of
skipped pipelines removed to avoid confusion with
@maestroSkip tag. ‘Total’ now refers to the total number of
pipelines invoked in a run - not all pipelines in the project.maestro::invoke() only runs the selected pipeline or
DAG rather than accidentally running all pipelines in the schedule
(#161)maestro::invoke() on a DAG pipeline successfully
executes the full DAG (#156).maestro::invoke() now properly passes resources
(arguments) to pipelines (#157).
Number of errors reported in get_status() is now
accurately reported.
New maestroFlags tag that allows passing arbitrary
pipeline tags, which is useful for documentation and labeling. Tags are
now accessible via get_flags().
New function get_slot_usage to help identify the
number of pipelines running on a scheduled time slot.
New maestroPriority tag that determines the order in
which pipelines on the same schedule instance execute. Uses integer
values from 1-N where 1 is the highest priority.
maestroStartTime now accepts a timestamp formatted
like HH:MM:SS. This is useful for pipelines running on a daily or hourly
frequency because the date is often arbitrary in those cases
(#143).
logging and log_file in
run_schedule(), which were deprecated in maestro 0.5.0 are
now fully removed.n_pipelines attribute of
<MaestroPipelineList> now corresponds correctly to
the number of pipelines.
create_pipeline no longer adds extra line breaks
where optional tags would be.
build_schedule() to address CRAN
check.Pipeline schedule sequences are now stored internally inside of
<MaestroPipeline> objects instead of generated during
run_schedule(). This has implications when caching a
schedule as the sequence only goes out 3 years in advance.
Performance improvements to run_schedule().
Specifying maestroHours, maestroDays,
maestroMonths now correctly adopts the time zone specified
in maestroTz (#141).
When using non UTC time zones, the presence of Daylight Savings
Time in the maestroStartTime is used to adjust the sequence
so that invocations occur on the same time interval.
Other time zone fixes to deal with differing
maestroTz and system time checks.
maestroHours, was only valid when
maestroFrequency was specified as ‘hourly’, but now ‘1
hour’ is also acceptable (same applies for other specifier
tags).
maestroFrequency of multiple weeks (e.g., 2 weeks) is
invalid.Pipeline errors, warnings, and messages can now be printed to the
console using run_schedule(log_to_console = TRUE). These
logs will be interwoven between messages created by maestro
(#130).
run_schedule() gains log_to_file
argument to specify whether to log to a file (replaces
logging and log_file arguments).
run_schedule() arguments logging and
log_file are deprecated. Use
log_to_file = TRUE to log to a generic maestro.log file or
log_to_file = '[path-to-your-log-file]' to log to a
specific text file.run_schedule() now warns if the unit of
orch_frequency is lower frequency than the highest
frequency pipeline in the project.
run_schedule() enforces a minimum
orch_frequency of 1 year (e.g., ‘2 years’ or more no longer
valid).
Message and warning counts are now properly displayed in the
status and output of run_schedule(), as well as in
last_run_messages() and last_run_warnings(),
even if they are below the maestroLogLevel.
Fixed display of run_schedule() to have more
accurate next run times for pipelines. This issue was evident when
running orchestrator on a frequency of daily or lower.
Fixed issue where pipelines with a dependency would run on a time schedule even if the upstream pipeline didn’t run (and vice versa).
Fixed output of next scheduled pipelines to better reflect DAG structures.
Directed acyclic graph (DAG) pipelines - where the output of one
pipeline can feed into another - are now available using the
maestroOutputs and maestroInputs tags.
Pipelines that input into a downstream pipeline should use the
maestroOutputs tag. Pipelines that receive input from an
upstream pipeline should use the maestroInputs tag
(#98).
New function show_network for visualizing the
connections between pipelines that are connected in a DAG.
MaestroSchedule gains new methods
get_network() (returns a data.frame) and
show_network() (returns a visualization using
{DiagrammeR}).
Added catch-all maestro tag to identify a function
as a pipeline without specifying other configurations.
create_pipeline to allow for
interactive creation of pipelines that default to skip.Fixed issue with suggest_orch_frequency when using
different styles of frequency (e.g., 1 day vs. daily) in a single
schedule.
Fixed issue where pipeline sourcing failures were appearing as successful runs in status outputs.
This version refactors much of the code base to rely on R6 classes for pipelines and schedules. Pay careful attention to the breaking changes to see how existing code may be impacted.
Schedules are now represented as an R6 object of class
<MaestroSchedule>. build_schedule()
returns a MaestroSchedule object that can be passed to
run_schedule() as normal. To access the schedule table run
get_schedule().
run_schedule() no longer returns a list of
$status and $artifacts but now
returns/modifies the MaestroSchedule object. Status can be accessed
using get_status(schedule) and artifacts via
get_artifacts(schedule)
suggest_orch_frequency() now takes a
<MaestroSchedule> object.
Data example_schedule removed from the
package.
Skipped pipelines are no longer shown in the CLI output of
run_schedule().
It is now required that all pipeline names are unique. The names
of each maestro pipeline function must be unique across the project to
support the implementation of DAGs. build_schedule() will
abort if any non-unique names are detected.
Added functions get_schedule(),
get_status(), and get_artifacts() for
interacting with <MaestroSchedule> objects.
Added function invoke() to instantly run a pipeline
in a schedule.
New tags maestroHours, maestroDays, and
maestroMonths allows running of pipelines on specific hours
of day, days of week, days of month, or months of year (#100).
maestroFrequency tag now accepts the values hourly,
daily, weekly, biweekly, monthly, quarterly, and yearly. Argument
orch_frequency to run_schedule() also accepts
these values.
Changed from example_schedule data the pipeline with
a schedule of 1 minute to 30 minutes in keeping with best practices for
minimum pipeline frequency.
suggest_orch_frequency now uses the smallest
interval between any two pipelines (#99).
Error messages on unintentional overwrites from
create_*() functions correctly reference name of path or
directory that was to be overwritten.
Fixed cli output of run_schedule() to not show
skipped pipelines in the next run portion.
Fixed cli output to correctly handle counting of successful runs when pipelines are skipped.
Performance improvements to build_schedule()
(#101).
Creater functions create_pipeline() and
create_maestro no longer have default arguments for the
path to where the scripts are created. Users must explicitly define
these paths.
Argument log_file in run_schedule() no
longer defaults to ./maestro.log but instead defaults to
NULL.
create_* now take a boolean
overwrite argument to make the overwriting of existing
pipelines, projects, and orchestrators more explicit.run_schedule() now returns a list with status and
artifacts instead of just a data.frame of the status. Artifacts are any
values returned from pipelines. Pipelines that return nothing will have
no artifacts.suggest_orch_frequency() to provide
a suggestion of what frequency to use for the orchestrator.run_schedule() now correctly outputs
the total number of pipelines (#81) and correctly outputs number of
errors.maestroFrequency tag now adheres to a more human-readable format like “1 day”, “2 hours”, “4 weeks”, etc.
orch_frequency argument in
run_schedule() also takes more human-readable format
identical to maestroFrequency tag.
maestroInterval tag removed
orch_interval argument to
run_schedule() removed.
create_maestro() and
create_orchestrator() now use the argument
type instead of extension for defining what
script type to use for the orchestrator.
Changed last_parsing_errors() to
last_build_errors(); changed functions of the form
last_runtime_*() to last_run_*().
Additional columns added to the output of
run_schedule(): pipeline_started and
pipeline_ended to indicate the start and end times of a
pipeline execution; next_run to indicate when the next run
should be based on the frequency of the pipeline and
orchestrator.
Pipelines now show as skipped if they are not scheduled.
Added hex logo
Backend improvements to schedule checking
Timestamps are formatted to specified time zone.
run_schedule() cli output suggests to use
last_run_errors() or last_run_warnings() if
any errors or warnings were found.