博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Flink – WindowedStream
阅读量:5775 次
发布时间:2019-06-18

本文共 16797 字,大约阅读时间需要 55 分钟。

在WindowedStream上可以执行,如reduce,aggregate,min,max等操作

关键是要理解windowOperator对KVState的运用,因为window是用它来存储window buffer的

采用不同的KVState,会有不同的效果,如ReduceState,ListState

 

Reduce

 

/**     * Applies the given window function to each window. The window function is called for each     * evaluation of the window for each key individually. The output of the window function is     * interpreted as a regular non-windowed stream.     *     * 

* Arriving data is incrementally aggregated using the given reducer. * * @param reduceFunction The reduce function that is used for incremental aggregation. * @param function The window function. * @param resultType Type information for the result type of the window function. * @param legacyWindowOpType When migrating from an older Flink version, this flag indicates * the type of the previous operator whose state we inherit. * @return The data stream that is the result of applying the window function to the window. */ private

SingleOutputStreamOperator
reduce( ReduceFunction
reduceFunction, WindowFunction
function, TypeInformation
resultType, LegacyWindowOperatorType legacyWindowOpType) { String opName; KeySelector
keySel = input.getKeySelector(); OneInputStreamOperator
operator; if (evictor != null) { @SuppressWarnings({ "unchecked", "rawtypes"}) TypeSerializer
> streamRecordSerializer = (TypeSerializer
>) new StreamElementSerializer(input.getType().createSerializer(getExecutionEnvironment().getConfig())); ListStateDescriptor
> stateDesc = //如果有evictor,这里state是list state,需要把windows整个cache下来,这样才能去evict new ListStateDescriptor<>("window-contents", streamRecordSerializer); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + evictor + ", " + udfName + ")"; //reduce的op name是这样拼的,可以看出window的所有相关配置 operator = new EvictingWindowOperator<>(windowAssigner, windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()), keySel, input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()), stateDesc, new InternalIterableWindowFunction<>(new ReduceApplyWindowFunction<>(reduceFunction, function)), trigger, evictor, allowedLateness); } else { //如果没有evictor ReducingStateDescriptor
stateDesc = new ReducingStateDescriptor<>("window-contents", //这里就是ReducingState,不需要cache整个list,所以效率更高 reduceFunction, //reduce的逻辑 input.getType().createSerializer(getExecutionEnvironment().getConfig())); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")"; operator = new WindowOperator<>(windowAssigner, windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()), keySel, input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()), stateDesc, new InternalSingleValueWindowFunction<>(function), trigger, allowedLateness, legacyWindowOpType); } return input.transform(opName, resultType, operator); }

 

reduceFunction,就是reduce的逻辑,一般只是指定这个参数

 

WindowFunction<T, R, K, W> function

TypeInformation<R> resultType

/**     * Applies a reduce function to the window. The window function is called for each evaluation     * of the window for each key individually. The output of the reduce function is interpreted     * as a regular non-windowed stream.     */

这个function是WindowFunction,在window被fire时调用,resultType是WindowFunction的返回值,通过reduce,windowedStream会成为non-windowed stream

/**     * Emits the contents of the given window using the {
@link InternalWindowFunction}. */ @SuppressWarnings("unchecked") private void emitWindowContents(W window, ACC contents) throws Exception { timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp()); userFunction.apply(context.key, context.window, contents, timestampedCollector); }

可以看到WindowFunction是对于每个key的window都会调用一遍

public void onEventTime(InternalTimer
timer) throws Exception { TriggerResult triggerResult = context.onEventTime(timer.getTimestamp()); if (triggerResult.isFire()) { emitWindowContents(context.window, contents); //当window被fire的时候,调用 }}

context.window是记录window的元数据,比如TimeWindow记录开始,结束时间

contents,是windowState,包含真正的数据

 

默认不指定,给定是

PassThroughWindowFunction
public class PassThroughWindowFunction
implements WindowFunction
{ private static final long serialVersionUID = 1L; @Override public void apply(K k, W window, Iterable
input, Collector
out) throws Exception { for (T in: input) { out.collect(in); } }}

 

继续现在WindowOperator

@Override    public void processElement(StreamRecord
element) throws Exception { for (W window: elementWindows) { //对于每个被assign的window // drop if the window is already late if (isLate(window)) { continue; } windowState.setCurrentNamespace(window); windowState.add(element.getValue()); //add element的值

 

windowState在WindowOperator.open中被初始化,

public void open() throws Exception {        // create (or restore) the state that hold the actual window contents        // NOTE - the state may be null in the case of the overriding evicting window operator        if (windowStateDescriptor != null) {            windowState = (InternalAppendingState
) getOrCreateKeyedState(windowSerializer, windowStateDescriptor); }

 

AbstractStreamOperator
protected 
S getOrCreateKeyedState( TypeSerializer
namespaceSerializer, StateDescriptor
stateDescriptor) throws Exception { if (keyedStateStore != null) { return keyedStateBackend.getOrCreateKeyedState(namespaceSerializer, stateDescriptor); }

 

AbstractKeyedStateBackend
public 
S getOrCreateKeyedState( final TypeSerializer
namespaceSerializer, StateDescriptor
stateDescriptor) throws Exception { // create a new blank key/value state S state = stateDescriptor.bind(new StateBackend() { @Override public
ValueState
createValueState(ValueStateDescriptor
stateDesc) throws Exception { return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc); } @Override public
ListState
createListState(ListStateDescriptor
stateDesc) throws Exception { return AbstractKeyedStateBackend.this.createListState(namespaceSerializer, stateDesc); } @Override public
ReducingState
createReducingState(ReducingStateDescriptor
stateDesc) throws Exception { return AbstractKeyedStateBackend.this.createReducingState(namespaceSerializer, stateDesc); } @Override public
AggregatingState
createAggregatingState( AggregatingStateDescriptor
stateDesc) throws Exception { return AbstractKeyedStateBackend.this.createAggregatingState(namespaceSerializer, stateDesc); }

可以看到这里根据不同的StateDescriptor调用bind,会生成不同的state

如果前面用的是ReducingStateDescriptor

@Override    public ReducingState
bind(StateBackend stateBackend) throws Exception { return stateBackend.createReducingState(this); }

 

所以如果用的是RockDB,

那么创建的是RocksDBReducingState

所以调用add的逻辑,

public class RocksDBReducingState
extends AbstractRocksDBState
, ReducingStateDescriptor
, V> implements InternalReducingState
{ @Override public void add(V value) throws IOException { try { writeCurrentKeyWithGroupAndNamespace(); byte[] key = keySerializationStream.toByteArray(); byte[] valueBytes = backend.db.get(columnFamily, key); DataOutputViewStreamWrapper out = new DataOutputViewStreamWrapper(keySerializationStream); if (valueBytes == null) { keySerializationStream.reset(); valueSerializer.serialize(value, out); backend.db.put(columnFamily, writeOptions, key, keySerializationStream.toByteArray()); } else { V oldValue = valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStream(valueBytes))); V newValue = reduceFunction.reduce(oldValue, value); //使用reduce函数合并value keySerializationStream.reset(); valueSerializer.serialize(newValue, out); backend.db.put(columnFamily, writeOptions, key, keySerializationStream.toByteArray()); //将新的value put到backend中 } } catch (Exception e) { throw new RuntimeException("Error while adding data to RocksDB", e); } }

 

aggregate

这里用AggregatingStateDescriptor

并且多个参数,TypeInformation<ACC> accumulatorType,因为aggregate是不断的更新这个accumulator

/**     * Applies the given window function to each window. The window function is called for each     * evaluation of the window for each key individually. The output of the window function is     * interpreted as a regular non-windowed stream.     *     * 

Arriving data is incrementally aggregated using the given aggregate function. This means * that the window function typically has only a single value to process when called. * * @param aggregateFunction The aggregation function that is used for incremental aggregation. * @param windowFunction The window function. * @param accumulatorType Type information for the internal accumulator type of the aggregation function * @param resultType Type information for the result type of the window function * * @return The data stream that is the result of applying the window function to the window. * * @param

The type of the AggregateFunction's accumulator * @param
The type of AggregateFunction's result, and the WindowFunction's input * @param
The type of the elements in the resulting stream, equal to the * WindowFunction's result type */ public
SingleOutputStreamOperator
aggregate( AggregateFunction
aggregateFunction, WindowFunction
windowFunction, TypeInformation
accumulatorType, TypeInformation
aggregateResultType, TypeInformation
resultType) { if (evictor != null) { //evictor仍然是用ListState } else { AggregatingStateDescriptor
stateDesc = new AggregatingStateDescriptor<>("window-contents", aggregateFunction, accumulatorType.createSerializer(getExecutionEnvironment().getConfig())); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")"; operator = new WindowOperator<>(windowAssigner, windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()), keySel, input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()), stateDesc, new InternalSingleValueWindowFunction<>(windowFunction), trigger, allowedLateness); } return input.transform(opName, resultType, operator); }

最终用到,

RocksDBAggregatingState
@Override    public R get() throws IOException {        try {            // prepare the current key and namespace for RocksDB lookup            writeCurrentKeyWithGroupAndNamespace();            final byte[] key = keySerializationStream.toByteArray();            // get the current value            final byte[] valueBytes = backend.db.get(columnFamily, key);            if (valueBytes == null) {                return null;            }            ACC accumulator = valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStreamWithPos(valueBytes)));            return aggFunction.getResult(accumulator); //返回accumulator的值        }        catch (IOException | RocksDBException e) {            throw new IOException("Error while retrieving value from RocksDB", e);        }    }    @Override    public void add(T value) throws IOException {        try {            // prepare the current key and namespace for RocksDB lookup            writeCurrentKeyWithGroupAndNamespace();            final byte[] key = keySerializationStream.toByteArray();            keySerializationStream.reset();            // get the current value            final byte[] valueBytes = backend.db.get(columnFamily, key);            // deserialize the current accumulator, or create a blank one            final ACC accumulator = valueBytes == null ? //create new或从state中反序列化出来                    aggFunction.createAccumulator() :                    valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStreamWithPos(valueBytes)));            // aggregate the value into the accumulator            aggFunction.add(value, accumulator); //更新accumulator            // serialize the new accumulator            final DataOutputViewStreamWrapper out = new DataOutputViewStreamWrapper(keySerializationStream);            valueSerializer.serialize(accumulator, out);            // write the new value to RocksDB            backend.db.put(columnFamily, writeOptions, key, keySerializationStream.toByteArray());        }        catch (IOException | RocksDBException e) {            throw new IOException("Error while adding value to RocksDB", e);        }    }

 

给个aggFunction的例子,

private static class AddingFunction implements AggregateFunction
{ @Override public MutableLong createAccumulator() { return new MutableLong(); } @Override public void add(Long value, MutableLong accumulator) { accumulator.value += value; } @Override public Long getResult(MutableLong accumulator) { return accumulator.value; } @Override public MutableLong merge(MutableLong a, MutableLong b) { a.value += b.value; return a; } } private static final class MutableLong { long value; }

aggregate和reduce比,更通用,

reduce, A1 reduce A2 = A3

aggregate,a1 a2… aggregate = b

 

apply

更通用,就是不会再cache的时候做预算,而是需要cache整个windows数据,在触发的时候再apply

/**     * Applies the given window function to each window. The window function is called for each     * evaluation of the window for each key individually. The output of the window function is     * interpreted as a regular non-windowed stream.     *     * 

* Note that this function requires that all data in the windows is buffered until the window * is evaluated, as the function provides no means of incremental aggregation. * * @param function The window function. * @param resultType Type information for the result type of the window function * @return The data stream that is the result of applying the window function to the window. */ public

SingleOutputStreamOperator
apply(WindowFunction
function, TypeInformation
resultType) { if (evictor != null) { // } else { ListStateDescriptor
stateDesc = new ListStateDescriptor<>("window-contents", //因为要cache所有数据,所以一定是ListState input.getType().createSerializer(getExecutionEnvironment().getConfig())); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")"; operator = new WindowOperator<>(windowAssigner, windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()), keySel, input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()), stateDesc, new InternalIterableWindowFunction<>(function), trigger, allowedLateness, legacyWindowOpType); } return input.transform(opName, resultType, operator); }

这里就很简单了,你必须要给出WindowFunction,用于处理window触发时的结果

这里也需要指明resultType

而且使用ListStateDescriptor,这种state只是把element加到list中

 

 

AggregationFunction

如sum,min,max

/**     * Applies an aggregation that sums every window of the data stream at the     * given position.     *     * @param positionToSum The position in the tuple/array to sum     * @return The transformed DataStream.     */    public SingleOutputStreamOperator
sum(int positionToSum) { return aggregate(new SumAggregator<>(positionToSum, input.getType(), input.getExecutionConfig())); }

 

public class SumAggregator
extends AggregationFunction
{

 

public abstract class AggregationFunction
implements ReduceFunction
{ private static final long serialVersionUID = 1L; public enum AggregationType { SUM, MIN, MAX, MINBY, MAXBY, }}

可以看到,无法顾名思义,这些AggregationFunction,是用reduce实现的

转载地址:http://hghux.baihongyu.com/

你可能感兴趣的文章
上位机和底层逻辑的解耦
查看>>
关于微信二次分享 配置标题 描述 图片??
查看>>
springcloud使用zookeeper作为config的配置中心
查看>>
校园火灾Focue-2---》洗手间的一套-》电梯
查看>>
css控制文字换行
查看>>
bzoj1913
查看>>
L104
查看>>
分镜头脚本
查看>>
链表基本操作的实现(转)
查看>>
邮件发送1
查看>>
[转] libcurl异步方式使用总结(附流程图)
查看>>
编译安装LNMP
查看>>
[转]基于display:table的CSS布局
查看>>
crm 02--->讲师页面及逻辑
查看>>
AS3.0 Bitmap类实现图片3D旋转效果
查看>>
Eigen ,MKL和 matlab 矩阵乘法速度比较
查看>>
带三角的面包屑导航栏(新增递增数字)
查看>>
Web应用程序安全与风险
查看>>
codeforces 984 A. Game
查看>>
CSS居中
查看>>