US 12,242,543 B1
	Configuration-based development of analytics pipelines
Gurudatta Pai, Matthews, NC (US); Menglin Cao, Danville, CA (US); Prahalad Thota, San Ramon, CA (US); Thomas Mann, Charlotte, NC (US); Braxton Meyer, Grafton, WI (US); and Ravindra Reddy Pathakota, TegaCay, SC (US)
Assigned to Wells Fargo Bank, N.A., San Francisco, CA (US)
Filed by Wells Fargo Bank, N.A., San Francisco, CA (US)
Filed on Jan. 15, 2021, as Appl. No. 17/150,064.
Int. Cl. G06F 8/70 (2018.01); G06F 8/71 (2018.01); G06F 9/48 (2006.01); G06F 16/907 (2019.01); G06N 5/025 (2023.01); G06F 16/2457 (2019.01); G06F 40/268 (2020.01); H04L 67/565 (2022.01)

CPC G06F 16/907 (2019.01) [G06F 8/71 (2013.01); G06F 9/4843 (2013.01); G06N 5/025 (2013.01); G06F 16/2457 (2019.01); G06F 40/268 (2020.01); H04L 67/565 (2022.05)]

12 Claims

1. A method comprising:

accessing, by a computing system, metadata identifying characteristics of a data flow pipeline, wherein the metadata includes a linear list of descriptions of a plurality of stages in the data flow pipeline, and wherein each of the descriptions in the linear list of descriptions is ordered in the metadata pursuant to a set of ordering rules;

constructing, by the computing system and based on the metadata, the data flow pipeline, wherein the data flow pipeline includes a plurality of stages, wherein the plurality of stages includes at least one single sourced stage and a multi-sourced stage in which data output by each of a subset of stages in the plurality of stages are used as input to the multi-sourced stage, and wherein constructing the data flow pipeline includes interpreting the linear list of descriptions in the metadata pursuant to the ordering rules so that (1) the description of each single-sourced stage receiving a single input in the pipeline is immediately preceded by a description of the stage that produces the single input, and (2) the description of the multi-sourced stage is preceded by a description, in a relative order, of each of the subset of stages that produce an input to the multi-sourced stage, wherein the relative order corresponds to an order in which each of the inputs to the multi-sourced stage is received by the multi-sourced stage;

monitoring, by the computing system, changes to the metadata, wherein monitoring the changes to the metadata includes monitoring a storage device storing the metadata;

processing, by the computing system, a first set of data using the data flow pipeline;

detecting, by the computing system, modifications to the metadata resulting in updated metadata, wherein detecting modifications to the metadata includes detecting addition of a reference to a first component in the metadata and detecting removal of a reference to a second component in the metadata;

constructing, by the computing system and based on the updated metadata, an updated data flow pipeline, wherein constructing the updated data flow pipeline includes interpreting the updated metadata by applying the ordering rules to the updated metadata to:

add a stage to the pipeline corresponding to the addition of the reference to the first component, and

remove a stage from the pipeline corresponding to the removal of the reference to the second component; and

processing, by the computing system, a second set of data using the updated data flow pipeline.