Practice 3: PyHARK の使い方

Practice 3 on this page are for HARK Ver.3.5.0 or later.

PyHARK の機能

PyHARK で何ができるのか

HARK Designer で設計したネットワークを Python でも記述できる

PyHARK の機能

Pythonの多様なライブラリにHARKの機能を追加
オンライン処理に加えてオフライン処理もサポート

PyHARKによるPythonプルグラミングは2通り

HARK Designer のネットワークプログラムからPyHARKによるPythonプログラミムへの変換
PyHARKの機能に基づいた Python プログラミング

HARKのネットワークプログラムからPyHARKの関数型プログラミング

HARK Designer	Python with PyHARK
Node	node by network.create
Node Property	node.xxx .add_input("PARAM", VALUE)
link	node_xxx .add_input("NAME", node_yyy["NAME"])
subnetwork with I/O terminals	function call by publisher and subscriber

PyHARK プログラミングの概要

HARK Designerの下図のネットワークを

Python プログラムに変換

HARK module　をインポート
データ構造を作成し，簡単な音響処理を実施
変換されたプログラムは以下に示す

#!/usr/bin/env python3

import soundfile
import numpy
import hark

## AudioStreamFromWaveノードの３段階の処理
## 0. 音響ファイルの読み込み
## 0. Read the audio file in float format (-1.0 to 1.0). Then convert it to a float in the
##    integer range that HARK handles. The example matches the range (-32768.0 to 32768.0)
##    close to the 16bit signed integer range (-32768 to 32767).
## -----------------------------------------------------------------------------------------
audio, rate = soundfile.read('input.wav', dtype=numpy.float32)
audio *= 2**15

## 1. 音響ファイルの解析条件の指定
## 1. Set the number of channels (nch) of the audio file, the window size (length) and the
##    shift width (advance) for framenized processing.
## -----------------------------------------------------------------------------------------
nch = audio.shape[1]
length = 512
advance = 160

## 3. Pythonの numpyを使用したフレーム作成
## 2. Frame it using Python's numpy module.
## -----------------------------------------------------------------------------------------
frames = numpy.lib.stride_tricks.sliding_window_view(audio, length, axis=0)[::advance, :, :]

## [0.], [1.] and [2.] above corresponds to the processing of the AudioStreamFromWave node,
## and the frames variable corresponds to the output of the AudioStreamFromWave node.
## The hark-lib is designed to seamlessly connect with HARK even for data created using
## Python modules.

## ３. MultiFFT ノードの処理
## 3. FFT multi-channel audio data with the MultiFFT node. Usage is simple. Just define a
##    node with hark.node."NodeName"() and give inputs and parameters to the node arguments.
##    The variables name of the argument is the name of the input terminal, and INPUT=frames
##    simply passes the data frames to the input terminal named INPUT. 
## -----------------------------------------------------------------------------------------
multifft_node = hark.node.MultiFFT()
multifft_out = multifft_node(INPUT=frames)

## The following function will change the default CONJ window function to the HAMMING window function.
## multifft_out = multifft_node(INPUT=frames, WINDOW="HAMMING")

## 4. MultiFFTの結果の出力
## 4. Getting the result of the MultiFFT node is easy. It is stored with the output
##    terminal name OUTPUT in the namespace of the output variable.
##    This is true even if there are multiple outputs of the node.
## -----------------------------------------------------------------------------------------
print(multifft_out.OUTPUT)

HARK-specific special object type

#!/usr/bin/env python3

import soundfile
import numpy
import hark

src_object = hark.harklib.Source()
src_object.id = 1
src_object.x = [1.0000, 0.0000, 0.0000]
src_object.power = 37.0

## If it's a std::vector on C++, it supports both list and numpy.array(list(), dtype.numpy.float32) types.
src_object.x = numpy.array([0.0000, 0.0000, 1.0000], dtype=numpy.float32)

print(src_object.id)
print(src_object.x)
print(src_object.power)

HARKで提供されるC++のnative container type へのアクセス法

std::vector, hark::Matrix, hark::Tensor() and std::map > typeなど
C++とPythonの対照表は下表のとおり：

C++	Python
std::vector<float> vector = {1.0f, 0.0f, 0.0f};	vector = [1.0, 0.0, 0.0] or vector = numpy.array( [1.0, 0.0, 0.0], dtype=numpy.float32)
hark::Matrix<float> matrix;matrix.noalias() = Eigen::Matrix<float, 2, 2, Eigen::RowMajor>::Identity();	matrix = numpy.array( [[1 0], [0, 1]], dtype=numpy.float32)
hark::Tensor<float> tensor; tensor.data.resize(8); for(size_t i=0; i<tensor.data.size(); i++){ tensor.data[i] = static_cast<float>(i);} std::vector<float> shape = {2, 2, 2}; std::swap(tensor.vsize, shape);	tensor = numpy.array( [[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]], dtype=numpy.float32)
std::map<int, std::vector<float> > map_i_vf; map_i_vf[0] = std::vector<float>(3);	map_i_vf = {0: [0.0, 0.0, 0.0]}

オンライン処理とオフライン処理の違い

オンライン処理

データを逐次的に（1フレームずつ）処理
HARK Middleware による処理とほぼ同じ
マイク入力を用いたリアルタイム処理と好相性

オフライン処理（バッチ処理）

データをまとめて処理
通常のPythonプログラミングと高い整合性

PyHARK の仕組みと利点

既存の HARK の機能を Python から呼び出す

オンライン処理・オフライン処理に対応

Python (や他の言語) に慣れた人には習得が容易
HARKと機械学習系のライブラリを併用可能

Practice3-1:　PyHARKによるファイル入力オンライン音源定位
`practice3-1.n` のPython `practice3-1.py` への変換

HARK Designerで設計していたネットワークを Python でも記述できる

Pythonの多様なライブラリにHARKの機能を追加
オンライン処理に加えてオフライン処理もサポート
本章での説明の順序は以下の通り

Practice3-1 では，ファイル入力によるオンライン音源定位（practice1-1の改良版）を取り上げる
HARK Designer によるネットワークファイル： practice3-1.n
PyHARK によるPythonプログラム： practice3-1.py

Practice3-1:　HARK Designer のネットワークファイル

ネットワークファイル practice3-1.n の構造を知る
MAINネットワークと２つのサブネットワークをPyHARKで変換
main関数の作成

Practice3-1: MAIN ネットワークのネットワークファイル

HARK Designer でのMAINネットワーク

ネットワークファイル practice3-1.n のヘッダー部とMAIN ネットワーク部分

#!/usr/bin/env batchflow
<?xml version="1.0"?>
<Document>
  <Network type="subnet" name="MAIN">
    <Node name="node_LOOP_1" type="LOOP" x="520" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
    </Node>
    <Node name="node_Constant_1" type="Constant" x="100" y="100">
      <Parameter name="VALUE" type="string" value="input.wav" description="The value"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_InputStream_1" type="InputStream" x="300" y="100">
      <Parameter name="TYPE" type="string" value="" description="Type of stream: stream, fd, or FILE (default stream)"/>
      <Parameter name="RETRY" type="int" value="" description="If set to N, InputStream will retry N times on open fail"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" ...
    </Node>
    <Link from="node_Constant_1" output="VALUE" to="node_InputStream_1" input="INPUT"/>
    <Link from="node_InputStream_1" output="OUTPUT" to="node_LOOP_1" input="INPUT"/>
    <NetOutput name="OUTPUT" node="node_LOOP_1" terminal="OUTPUT" object_type="any" description="Dynamic"/>
  </Network>

Practice3-1: sub_Localization サブネットワークのネットワークファイル

HARK Designer での sub_Localization サブネットワーク

ネットワークファイル practice3-1.n のsub_Localizaton サブネットワーク部分

  <Network type="subnet" name="sub_localization">
    <Node name="node_LocalizeMUSIC_1" type="LocalizeMUSIC" x="370" y="100">
      <Parameter name="MUSIC_ALGORITHM" type="string" value="SEVD" description="Sound Source Localization Algorithm. If SEVD, NOISECM will be ignored"/>
      <Parameter name="TF_CHANNEL_SELECTION" type="object" value="<Vector<int>" description="Microphone channels for localization. If vacant, all channels will be used."/>
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE"  description="Sampling Rate (Hz)."/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>.
      <Parameter name="A_MATRIX" type="string" value="tf.zip" description="Filename of a transfer function matrix."/>.
      <Parameter name="WINDOW" type="int" value="50" description="The number of frames used for calculating a correlation function."/>
      <Parameter name="WINDOW_TYPE" type="string" value="MIDDLE" description="Window selection to accumulate a correlation function. If PAST, the past WINDOW frames from the current frame are used for the accumulation. If MIDDLE, the current frame will be the middle of the accumulated frames. If FUTURE, the future WINDOW frames from the current frame are used for the accumulation. FUTURE is the default from version 1.0, but this makes a delay since we have to wait for the future information. PAST generates a internal buffers for the accumulation, which realizes no delay for localization."/>
      <Parameter name="PERIOD" type="int" value="50" description="The period in which the source localization is processed."/>
      <Parameter name="NUM_SOURCE" type="int" value="2" description="Number of sources, which should be less than number of channels."/>.
      <Parameter name="MIN_DEG" type="int" value="-180" description="source direction (lower)."/>
      <Parameter name="MAX_DEG" type="int" value="180" description="source direction (higher)."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="3000" description="Lower bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="6000" description="Lower bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="SPECTRUM_WEIGHT_TYPE" type="string" value="A_Characteristic" description="MUSIC spectrum weight for each frequency bin."/>
      <Parameter name="A_CHAR_SCALING" type="float" value="1.0" description="Scaling factor of the A-Weight with respect to frequency"/>
      <Parameter name="MANUAL_WEIGHT_SPLINE" type="object" value="<Matrix<float> <rows 2> <cols 5> <data 0.0 2000.0 4000.0 6000.0 8000.0 1.0 1.0 1.0 1.0 1.0> >" description="MUSIC spectrum weight for each frequency bin. This is a 2 by M matrix. The first row represents the frequency, and the second row represents the weight gain. "M" represents the number of key points for the spectrum weight. The frequency range between M key points will be interpolated by spline manner. The format is "&lt;Matrix&lt;float&gt; &lt;rows 2&gt; &lt;cols 2&gt; &lt;data 1 2 3 4&gt; &gt;"."/>
      <Parameter name="MANUAL_WEIGHT_SQUARE" type="object" value="<Vector<float> 0.0 2000.0 4000.0 6000.0 8000.0>" description="MUSIC spectrum weight for each frequency bin. This is a M order vector. The element represents the frequency points for the square wave. "M" represents the number of key points for the square wave weight. The format is "&lt;Vector&lt;float&gt; 1 2 3 4&gt;"."/>
      <Parameter name="ENABLE_EIGENVALUE_WEIGHT" type="bool" value="false" description="If true, the spatial spectrum is weighted depending on the eigenvalues of a correlation matrix. We do not suggest to use this function with GEVD and GSVD, because the NOISECM changes the eigenvalue drastically. Only useful for SEVD."/>
      <Parameter name="MAXNUM_OUT_PEAKS" type="int" value="-1" description="Maximum number of output peaks. If MAXNUM_OUT_PEAKS = NUM_SOURCE, this is compatible with HARK version 1.0. If MAXNUM_OUT_PEAKS = 0, all local maxima are output. If MAXNUM_OUT_PEAKS &lt; 0, MAXNUM_OUT_PEAKS is set to NUM_SOURCE. If MAXNUM_OUT_PEAKS &gt; 0, number of output peaks is limited to MAXNUM_OUT_PEAKS."/>
      <Parameter name="DEBUG" type="bool" value="true" description="Debug option. If the parameter is true, this node outputs sound localization results to a standard output."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceTracker_1" type="SourceTracker" x="590" y="100">
      <Parameter name="THRESH" type="float" value="25" description="Power threshold for localization results. A localization result with higher power than THRESH is tracked, otherwise ignored."/>
      <Parameter name="PAUSE_LENGTH" type="float" value="1200" description="Life duration of source in ms. When any localization result for a source is found for more than PAUSE_LENGTH / 10 iterations, the source is terminated. [default: 800]"/>
      <Parameter name="MIN_SRC_INTERVAL" type="float" value="20" description="Source interval threshold in degree. When the angle between a localization result and a source is smaller than MIN_SRC_INTERVAL, the same ID is given to the localization result. [default: 20]"/>
      <Parameter name="MIN_ID" type="int" value="0" description="Minimum ID of source locations. MIN_ID should be greater than 0 or equal."/>
      <Parameter name="DEBUG" type="bool" value="false" description="Output debug information if true [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceIntervalExtender_1" type="SourceIntervalExtender" x="820" y="100">
      <Parameter name="PREROLL_LENGTH" type="int" value="80" description="Preroll length in frame. [default: 50]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_plotQuickSourceKivy_1" type="plotQuickSourceKivy" x="1110" y="100">
    </Node>
    <Node name="node_CMIdentityMatrix_1" type="CMIdentityMatrix" x="100" y="190">
      <Parameter name="NB_CHANNELS" type="int" value="8" description="The number of input channels."/>
      <Parameter name="LENGTH" type="int" value="512" description="The length of a frame (per channel)."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_LocalizeMUSIC_1" output="OUTPUT" to="node_SourceTracker_1" input="INPUT"/>
    <Link from="node_SourceTracker_1" output="OUTPUT" to="node_SourceIntervalExtender_1" input="SOURCES"/>
    <Link from="node_SourceIntervalExtender_1" output="OUTPUT" to="node_plotQuickSourceKivy_1" input="SOURCES"/>
    <Link from="node_CMIdentityMatrix_1" output="OUTPUT" to="node_LocalizeMUSIC_1" input="NOISECM"/>
    <NetInput name="INPUT" node="node_LocalizeMUSIC_1" terminal="INPUT" object_type="Matrix&lt;complex&lt;float&gt; &gt;" description="Multi-channel audio signals. In this matrix, a row is a channel, and a column is a sample."/>
    <NetOutput name="OUTPUT" node="node_plotQuickSourceKivy_1" terminal="OUTPUT" object_type="any" description=""/>
  </Network>

Practice3-1: LOOP サブネットワークのネットワークファイル

HARK Designer でのLOOPサブネットワーク

ネットワークファイル practice3-1.n のLOOP ネットワーク部分

  <Network type="iterator" name="LOOP">
    <Node name="node_MultiFFT_1" type="MultiFFT" x="430" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="USE_WAIT" type="bool" value="false" description="If true, real recording is simulated [default: false]."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_sub_localization_1" type="sub_localization" x="640" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling Rate (Hz)."/>
    </Node>
    <Node name="node_AudioStreamFromWave_1" type="AudioStreamFromWave" x="100" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="USE_WAIT" type="bool" value="false" description="If true, real recording is simulated [default: false]."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_localization_1" input="INPUT"/>
    <Link from="node_AudioStreamFromWave_1" output="AUDIO" to="node_MultiFFT_1" input"node_sub_localization_1" input="INPUT"/>
    <NetInput name="INPUT" node="node_AudioStreamFromWave_1" terminal="INPUT" object_type="Stream" description="An audio input stream (IStream)."/>
    <NetCondition name="CONDITION" node="node_AudioStreamFromWave_1" terminal="NOT_EOF"/>
    <NetOutput name="OUTPUT" node="node_sub_localization_1" terminal="OUTPUT" object_type="any" description=""/>
  </Network>
</Document>

Practice3-1: 変換された PyHARK によるPythonプログラム
practice3-1.py

HARK ネットワークファイルから MyHARKを使ったPythonプログラムへの変換

practice3-1.py PyHARKによる practice3-1.n のプログラムの概要

ネットワークファイルからPyHARKを用いたPythonプログラムへの変換の原則:

すべてのネットワークは，hark.NetworkDef を継承したネットワークをクラスとして定義
build() メソッド中にネットワークの構造を記述
必要なノードを network.create で作成
ノードの接続関係は，入力端子に対して node_xxx.add_input("NAME", node_yyy["NAME"]) で定義
ノードのパラメータを node_xxx.add_input("PARM", VALUE) で定義

Python プログラム `practice3-1.py` の全体像

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import threading
import time

import numpy as np
import hark
import soundfile as sf

# import plotQuickWaveformKivy
# import plotQuickSpecKivy
# import plotQuickMusicSpecKivy
import plotQuickSourceKivy

class HARK_Localization(hark.NetworkDef):
    def build(self,
	      network: hark.Network,
	      input:   hark.DataSourceMap,
	      output:  hark.DataSinkMap):
	# 必要なノードを作成する
	try:
	    node_cm_identity_matrix = network.create(hark.node.CMIdentityMatrix, dispatch=hark.RepeatDispatcher)
	    node_constant_for_operation_flag = network.create(hark.node.Constant, dispatch=hark.RepeatDispatcher)
	    node_localize_music = network.create(hark.node.LocalizeMUSIC)
	    node_source_tracker = network.create(hark.node.SourceTracker)
	    node_source_interval_extender = network.create(hark.node.SourceIntervalExtender)
	    node_plotsource_kivy = network.create(plotQuickSourceKivy.plotQuickSourceKivy)

	except BaseException as ex:
	    print(ex)
	# ノード間の接続（データの流れ）とパラメータを記述し，
	# ネットワークに含まれるノードの一覧を含むリストを作成する
	try:
	    r = [
	        node_cm_identity_matrix
	            .add_input("NB_CHANNELS", 8)
	            .add_input("LENGTH", 512)
	        ,
	        node_constant__for_operation_flag
	            .add_input("VALUE", True)
	        ,
	        node_localize_music
	            .add_input("INPUT", input["SPEC"])
	            .add_input("NOISECM", node_cm_identity_matrix["OUTPUT"])
	            .add_input("OPERATION_FLAG", node_constant__for_operation_flag["OUTPUT"])
	            .add_input("MUSIC_ALGORITHM", "SEVD")
	            .add_input("A_MATRIX", "tf.zip")
	            .add_input("WINDOW_TYPE", "MIDDLE")
	            .add_input("LOWER_BOUND_FREQUENCY", 3000)
	            .add_input("UPPER_BOUND_FREQUENCY", 6000)
	            .add_input("SPECTRUM_WEIGHT_TYPE", "A_Characteristic")
	            .add_input("ENABLE_EIGENVALUE_WEIGHT", False)
	            .add_input("ENABLE_OUTPUT_SPECTRUM", True)
	        ,
	        node_source_tracker
	            .add_input("INPUT", node_localize_music["OUTPUT"])
	            .add_input("THRESH", 25.0)
	            .add_input("PAUSE_LENGTH", 1200.0)
	            #.add_input("MIN_SRC_INTERVAL", 20.0)
	        ,
	        node_source_interval_extender
	            .add_input("SOURCES", node_source_tracker["OUTPUT"])
	            .add_input("PREROLL_LENGTH", 80)
	        ,
	        node_plotsource_kivy
	            .add_input("SOURCES", node_source_interval_extender["OUTPUT"])
	        ,
	    ]
	
	    output.add_input("OUTPUT", node_source_interval_extender["OUTPUT"])

	except BaseException as ex:
	    print('error: {}'.format(ex))
	# ノード一覧のリストを返す
	return r

class HARK_Loop(hark.NetworkDef):
    def build(self,
	     network: hark.Network,
	     input:   hark.DataSourceMap,
	     output:  hark.DataSinkMap):

    		# 必要なノードを作成する
	try:
	     node_audio_stream_from_memory = network.create(
			hark.node.AudioStreamFromMemory, dispatch=hark.TriggeredMultiShotDispatcher, name="AudioStreamFromMemory")
	     node_multi_fft = network.create(hark.node.MultiFFT)
	     node_sub_localization = network.create(HARK_Localization, name="Localization")
	
	except BaseException as ex:
        print(ex)

    # ノード間の接続（データの流れ）とパラメータを記述し，
    # ネットワークに含まれるノードの一覧を含むリストを作成する
	try:
	    r = [
		node_audio_stream_from_memory
		    .add_input("INPUT", input["INPUT"])
		    .add_input("CHANNEL_COUNT", 8)
			,
		node_multi_fft
		    .add_input("INPUT", node_audio_stream_from_memory["AUDIO"])
		,
		node_sub_localization
		    .add_input("SPEC", node_multi_fft["OUTPUT"])
		,
		]

		output.add_input("OUTPUT", node_sub_localization["OUTPUT"])

	except BaseException as ex:
		print('error: {}'.format(ex))
	# ノード一覧のリストを返す
	return r

class HARK_Main(hark.NetworkDef):
    # メインネットワークに相当するクラス。
    # 入力として8ch音響信号を受け取り、
    # フーリエ変換、MUSIC法による音源定位、音源追跡を行い、
    # その結果を図示する。

    def build(self,
	     network: hark.Network,
	     input:   hark.DataSourceMap,
	     output:  hark.DataSinkMap):

		# 必要なノードを作成する．
		# - 全体の入出力を扱う Publisher と Subscriber
		# - HARK_Loop サブネット
	try:
	    node_publisher = network.create(hark.node.PublishData,dispatch=hark.RepeatDispatcher,name="Publisher")
	    node_subscriber = network.create(hark.node.SubscribeData,name="Subscriber")
	  
	    loop = network.create(HARK_Loop, name="HARK_Loop")

	except BaseException as ex:
		print(ex)

	# ノード間の接続（データの流れ）とパラメータを記述し，
	# ネットワークに含まれるノードの一覧を含むリストを作成する
	try:
	    r = [
		loop.add_input("INPUT", node_publisher["OUTPUT"])
		,
		node_subscriber.add_input("INPUT", loop["OUTPUT"])
		,
	    ]

	except BaseException as ex:
		print(ex)
	# ノード一覧のリストを返す
	return r

def main():
    # コマンドライン引数の処理
    if len(sys.argv) < 2:
       print("no input file")
       return
    wavfilename = sys.argv[1]
    # メインネットワークを構築
    network = hark.Network.from_networkdef(HARK_Main, name="HARK_Main")
    # メインネットワークへの入出力を構築
    publisher = network.query_nodedef("Publisher")
    subscriber = network.query_nodedef("Subscriber")

    # subscriber がデータを受け取ったとき
    # （メインネットワークが結果を出力したとき）に
    # 実行される動作を定義する。
    # ここでは pass を用いることで「何もしない」ことを指示する。
    def received(data):
        pass

    subscriber.receive = received

    try:
        # ネットワーク実行用スレッドを開始
        th = threading.Thread(target=network.execute)
        th.start()

        # 入力ファイル読み込み
        audio, rate = sf.read(wavfilename, dtype=np.int16)

        # フレーム分割
        advance = 160
        # 2023講習会VMは numpy==1.21.5
        # for numpy>=1.20.0
        frames = np.lib.stride_tricks.sliding_window_view(audio, advance, axis=0)[::advance, :, :]
        # for numpy < 1.20.0
        # frames = np.lib.stride_tricks.as_strided(
        #     audio,
        #     shape=(int(audio.shape[0]/advance), advance, audio.shape[1]),
        #     strides=(advance * audio.shape[1] * audio.strides[1], audio.shape[1]*audio.strides[1], audio.strides[1])
        # )
       
        # フレームごとに処理
        for t, f in enumerate(frames):
            # もしネットワーク実行用スレッドが停止していたら
            # ループを抜け処理全体を停止させる
            if not th.is_alive():
                break
	            
            # ネットワークに1フレーム分の音響信号を送信
            publisher.push(f)
            # リアルタイム処理と同等程度の処理時間となるように
            # 音響信号送信間隔を調整する
            # time.sleep(advance / rate)
    except BaseException as ex:
        print(ex)
    except:
        network.stop()
    finally:
        # 終了処理
        publisher.close()
        if th.ident is not None:
            th.join()

if __name__ == '__main__':
    main()
# end of file

Practice3-1： PyHARKによる音源定位を動かして，挙動を知る

ターミナルを開いてPythonプログラムを実行する

cd practice3/data
python practice3-1.py input.wav

ウィンドウがポップアップし，音源定位結果が表示される

ウェイトなしのためPCの性能によっては定位結果が高速に流れる
practice3-1.py を practice3-1w.py にすると実時間に近い速度になる

practice3-1.py:　PyHARKによるオンライン音源定位プログラムの詳細

PyHARK化の原則（再掲）

すべてのネットワークは，hark.NetworkDef を継承したネットワークをクラスとして定義
build() メソッド中にネットワークの構造を記述
必要なノードを network.create で作成
ノードの接続関係は，入力端子に対して node_xxx.add_input("NAME", node_yyy["NAME"]) で定義
ノードのパラメータを node_xxx.add_input("PARM", VALUE) で定義

Python クラス HARK_Localization （音源定位サブネットワーク）

HARK Designerの音源定位サブネットワーク　subLocalizaton

hark.NetworkDef を継承したクラスとして HARK_Localization クラスの定義
PyHARKの network.create で必要な5つのノードの作成

class HARK_Localization(hark.NetworkDef): # hark.NetworkDefを継承したクラス
		# build() メソッドを定義し，サブネットの構造を記述
    def build(
	      self,				# オブジェクト自身を表す変数
	      network: hark.Network,		# ノードs生成等を行うためのエントリポイント
	      input:   hark.DataSourceMap,	# このサブネットへの入力を表す変数
	      output:  hark.DataSinkMap):	# このサブネットからの出力を表す変数

		# LocalizeMUSIC　（音源定位）
	node_localize_music = network.create(hark.node.LocalizeMUSIC)
		# CMIdentityMatrix （雑音相関行列用の単位行列作成）
	node_cm_identity_matrix = network.create(
		hark.node.CMIdentityMatrix,
		dispatch=hark.RepeatDispatcher)
		# SourceTracker (音源追跡)	
	node_source_tracker = network.create(hark.node.SourceTracker)
		# SourceIntervalExtender （定位開始時刻調整）
	node_source_interval_extender = network.create(hark.node.SourceIntervalExtender)
		# plotQuickSourceKivy (音源定位結果可視化)	
	node_plotsource_kivy = network.create(plotQuickSourceKivy.plotQuickSourceKivy)

node_localize_music のパラメータの記述と node_cm_identity_matrix との接続（下図）
対応するプログラムは下記

class "HARK_Localization(hark.NetworkDef):
    def build(self, network, input, output):
		(上記参照)
		......

	# ノード間の接続（データの流れ）とパラメータを記述し，
	# ネットワークに含まれるノードの一覧を含むリストを作成する
	try:
			# 使用したノードを格納したリストを作り　build() の返値とする
	    r = [
	        node_cm_identity_matrix
	            .add_input("NB_CHANNELS", 8)
	            .add_input("LENGTH", 512)
	        ,
	        node_constant__for_operation_flag
	            .add_input("VALUE", True)
	        ,
	        node_localize_music
		      # パラメータ input を LocalizeMUSIC の INPUT端子 ("INPUT"）に接続（上図）
	            .add_input("INPUT", input["INPUT"])
		      # CMIdentityMatrix のOUTPUT端子を NOISECM入力端子へ接続（上図）
	            .add_input("NOISECM", node_cm_identity_matrix["OUTPUT"])
	            .add_input("OPERATION_FLAG", node_constant__for_operation_flag["OUTPUT"])
		      # LocalizeMUSIC の Property における値の設定（下図）	
	            .add_input("MUSIC_ALGORITHM", "SEVD")
	            .add_input("A_MATRIX", "tf.zip")
	            .add_input("WINDOW_TYPE", "MIDDLE")
	            .add_input("LOWER_BOUND_FREQUENCY", 3000)
	            .add_input("UPPER_BOUND_FREQUENCY", 6000)
	            .add_input("SPECTRUM_WEIGHT_TYPE", "A_Characteristic")
	            .add_input("ENABLE_EIGENVALUE_WEIGHT", False)
	            .add_input("ENABLE_OUTPUT_SPECTRUM", True)
	        ,
	        node_source_tracker
	            .add_input("INPUT", node_localize_music["OUTPUT"])
	            .add_input("THRESH", 25.0)
	            .add_input("PAUSE_LENGTH", 1200.0)
	            .add_input("MIN_SRC_INTERVAL", 20.0)
	        ,
	        node_source_interval_extender
	            .add_input("SOURCES", node_source_tracker["OUTPUT"])
	            .add_input("PREROLL_LENGTH", 80)
	        ,
	        node_plotsource_kivy
	            .add_input("SOURCES", node_source_interval_extender["OUTPUT"])
	        ,
	    ]

node_localize_music における node_cm_identity_matrix からの接続の記述（上記プログラム参照）

HARK_Localization クラスの build()のパラメータ input を node_localize_music で呼び出す（上記プログラム参照）

node_localize_music ノードのパラメータの設定（上記プログラム参照）

PyHARKによるnode_localization_mudic のパラメータの設定

node_localize_music で使用したノードのリストを作り　build() の返値とする（上記プログラム参照）

Practice3-1:　Python LOOPサブネットワーク

HARK DesignerのLOOPサブネットワーク（下図）を

PyHARKを使用したPythonプログラムに置き換える（下記プログラム）

必要な３つのノードを network.create で作成
AudioStreamFromWave/Mic のかわりに AudioStreamFromMemory を使用
MultiFFT (フーリエ変換)
自作した音源定位サブネットワーク HARK_Localization　のクラスを指定
作成したノードのリストを build() の返値とする

clpreass HARK_Loop(hark.NetworkDef):
    def build(self, network, input, output)
  　　　　...
		# AudioStreamFromWave/Mic のかわりに AudioStreamFromMemory を使用
        node_audio_stream_from_memory = network.create(
            hark.node.AudioStreamFromMemory,
            dispatch=hark.TriggeredMultiShotDispatcher,
            name="AudioStreamFromMemory")
		# MultiFFT (フーリエ変換)
        node_multi_fft = network.create(hark.node.MultiFFT)
		# 音源定位サブネットワーク HARK_Localization（自作）のクラスを指定	
        node_sub_localization = network.create(HARK_Localization, name="Localization")

        r = [
            node_audio_stream_from_memory
                .add_input("INPUT", input["INPUT"])
                .add_input("CHANNEL_COUNT", 8),
            node_multi_fft
                .add_input("INPUT", node_audio_stream_from_memory["AUDIO"]),
            node_sub_localization
                .add_input("SPEC", node_multi_fft["OUTPUT"]),
        ]
        output.add_input("OUTPUT", node_sub_localization["OUTPUT"])

        return r

Practice3-1:　Python MAINネットワーク

HARK Designer のMAINネットワーク（除く，Constant と InputStream）を

PyHARKを使ってPythonプラグラムに変換

MAINネットワークをクラスとして定義作成

ネットワーク全体の入出力を扱うための特殊な2つのノードpublisher と subscriber を作成
LOOPサブネットを作成
Constant と InputStream は作成せず，PyHARKではなく，Python で実装

MAINネットワークのノードの接続と返値の記述

入力は publisher["OUTPUT"]，出力は subscriber["INPUT"]であることに注意
使用したノードを格納したリストを作り，build() の返値とする

class HARK_Main(hark.NetworkDef):　# サブネットをクラスとして定義
    def build(self,
	      network: hark.Network,
	      input:   hark.DataSourceMap,
	      output:  hark.DataSinkMap):
		# ネットワーク全体の入出力を扱うための特殊な2つのノード	
		# publisher と subscriber を作成
	node_publisher = network.create(
	      hark.node.PublishData,
	      dispatch=hark.RepeatDispatcher,
	      name="Publisher"
	)
	node_subscriber= network.create(
	      hark.node.SubscribeData,
	      name="Subscriber"
	)
		# LOOPサブネットを作成
	loop = network.create(HARK_Loop, name="HARK_Loop")
		# LOOPサブネットの入出力		
	loop.add_input("INPUT", node_publisher["OUTPUT"]),     # MAINネットワークへの入力
	node_subscriber.add_input("INPUT", loop["OUTPUT"]),    # MAINネットワークからの出力

		# 使用したノードを格納したリストを作り，build() の返値とする
	r = [loop, node_subscriber]    
	return r

Practice3-1:　Python `main`関数

main関数の構造（Constant と InputStreamはここで定義）

def main():
	...
	# メインネットワークを構築
    network = hark.Network.from_networkdef(HARK_Main, name="HARK_Main")
	# メインネットワークへの入出力を構築
    publisher = network.query_nodedef("Publisher")
    subscriber = network.query_nodedef("Subscriber")

	# subscriber がデータを受け取ったとき
	# （メインネットワークが結果を出力したとき）に
	# 実行される動作を定義する。
	# ここでは pass を用いることで「何もしない」ことを指示する。
    def received(data):
	 pass
	
    subscriber.receive = received
		......

main 関数の入力の定義

def main():
	（上述参照）
    ......
	# WAVファイル読み込み
    audio, rate = sf.read("input.wav", dtype=np.int16)

	# 信号を一定間隔 (160サンプル) ごとに分割
	# (numpy.lib.stride_tricks.sliding_window_view を使用)
    frames = sliding_window_view(audio, 160, axis=0)[::160, :, :]

	# ネットワーク実行用のスレッドを作成し起動
    th = threading.Thread(target=network.execute)
    th.start()
    ......

main関数のpubliserと返値の記述

def main():
    ......
    try:
	for f in frames:  
		if not th.is_alive():	# ネットワークのスレッドが停止していたら，
			break		# それ以上の処理は行わない

		publisher.push(f)	# 分割した音響信号をネットワークに送信

		time.sleep(0.01) 
			# リアルタイム処理と同等程度の処理時間となるように送信間隔を調節

    finally:     # 終了処理
	publisher.close()
	if th.ident is not None:
		th.join()
		......

Practice3-2:　PyHARK マイク入力によるオンライン音源定位
PyHARKによるプログラミング: `practice-3-1r.py`

本章での説明の順序は以下の通り

Practice3-2 では，マイク入力によるオンライン音源定位（practice1-2の改良版）を取り上げる
HARK Designer によるネットワークファイル： practice3-1r.n（以下のプログラム参照）
practice3-1r.n の practice3-1.n からの変更点は，
1. マイク入力（対応するPyHARKによるpythonプログラム： practice3-2.py）
2. 音源分離，音声認識も含む（対応するPyHARKによるpythonプログラム： practice3-3.py）

#!/usr/bin/env batchflow
<?xml version="1.0"?>
<Document>
  <Network type="subnet" name="MAIN">
    <Node name="node_LOOP_1" type="LOOP" x="520" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
    </Node>
    <Node name="node_Constant_1" type="Constant" x="100" y="100">
      <Parameter name="VALUE" type="string" value="input.wav" description="The value"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_InputStream_1" type="InputStream" x="290" y="100">
      <Parameter name="TYPE" type="string" value="" description="Type of stream: stream, fd, or FILE (default stream)"/>
      <Parameter name="RETRY" type="int" value="" description="If set to N, InputStream will retry N times on open fail"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_Constant_1" output="VALUE" to="node_InputStream_1" input="INPUT"/>
    <Link from="node_InputStream_1" output="OUTPUT" to="node_LOOP_1" input="INPUT"/>
    <NetOutput name="ASR-A" node="node_LOOP_1" terminal="ASR-A" object_type="any" description="Dynamic"/>
    <NetOutput name="DUMMY" node="node_LOOP_1" terminal="DUMMY" object_type="any" description="Dynamic"/>
  </Network>
  <Network type="subnet" name="sub_separation">
    <Node name="node_GHDSS_1" type="GHDSS" x="100" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="0" description="Lower bound of frequency (Hz). [default: 0]"/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="8000" description="Upper bound of frequency (Hz). [default: 8000]"/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>
      <Parameter name="TF_CONJ_FILENAME" type="string" value="tf.zip" description="Filename of a pre-measured transfer function for separation."/>
      <Parameter name="INITW_FILENAME" type="string" value="" description="Filename of an initial separation matrix. If specified, a matrix in INITW_FILENAME is used as an initial separation matrix. Otherwise, initial separation matrix is estimated from the geometrical relationship or pre-measured TF according to TF_CONJ."/>
      <Parameter name="SS_METHOD" type="string" value="ADAPTIVE" description="The calculation method for SS step size parameter corresponding to the blind separation part. "FIX" uses a fixed step size,"LC_MYU" uses the same value as LC_MYU, and "ADAPTIVE" adaptively estimates an optimal step size. [default: ADAPTIVE]"/>
      <Parameter name="SS_SCAL" type="float" value="1.0" description="Scaling factor for SS step size. [default: 1.0]"/>
      <Parameter name="SS_MYU" type="float" value="0.001" description="SS step size value. [default 0.001]"/>
      <Parameter name="NOISE_FLOOR" type="float" value="0.0" description="Noise floor value. [default 0.0]"/>
      <Parameter name="LC_CONST" type="string" value="DIAG" description="The calculation method for geometric constraints. "FULL" uses all elements of a matrix, and "DIAG" only uses diagonal parts. [default: FULL]"/>
      <Parameter name="LC_METHOD" type="string" value="ADAPTIVE" description="The calculation method for LC step size corresponding to geometric constraints. "FIX" uses a fixed value, and "Adaptive" adaptively estimates an optimal step size. [default: ADAPTIVE]"/>
      <Parameter name="LC_MYU" type="float" value="0.001" description="LC step size value. [default 0.001]"/>
      <Parameter name="UPDATE_METHOD_TF_CONJ" type="string" value="POS" description="Switching method of TF_CONJ data. [default: POS]"/>
      <Parameter name="UPDATE_METHOD_W" type="string" value="ID" description="Switching method of separation matrix, W. [default: ID]"/>
      <Parameter name="UPDATE_ACCEPT_DISTANCE" type="float" value="300" description="Distance allowance to switch separation matrix in [mm]. available when when UPDATE_METHOD_W is POS or ID_POS. [default: 300.0]"/>
      <Parameter name="EXPORT_W" type="bool" value="false" description="Separation matrix W is exported if true. [default: false]"/>
      <Parameter name="EXPORT_W_FILENAME" type="string" value="" description="The filename to export W."/>
      <Parameter name="UPDATE" type="string" value="STEP" description="The update method of separation matrix. "STEP" updates W sequentially, i.e., based on SS and then on LC cost. "TOTAL" updates W based on an integrated value of SS and LC cost [default: STEP]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SaveWavePCM_1" type="SaveWavePCM" x="580" y="200">
      <Parameter name="BASENAME" type="string" value="sep_" description="Basename of files. [default: sep_]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (in samples)[default: 16000]."/>
      <Parameter name="BITS" type="string" value="int24" description="Bit format of samples. int16 and int24  bits are supported."/>
      <Parameter name="INPUT_BITS" type="string" value="auto" description="Bit format of input wav file."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_Synthesize_1" type="Synthesize" x="370" y="200">
      <Parameter name="LENGTH" type="int" value="512" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="ADVANCE" type="int" value="160" description="The length in sample between a frame and a previous frame. [default: 160]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate (Hz) [default: 16000]."/>
      <Parameter name="MIN_FREQUENCY" type="int" value="125" description="Minimum frequency (Hz) [default: 125]"/>
      <Parameter name="MAX_FREQUENCY" type="int" value="7900" description="Maximum frequency (Hz) [default: 7900]"/>
      <Parameter name="WINDOW" type="string" value="HAMMING" description="A window function for overlap-add. WINDOW should be CONJ, HAMMING, RECTANGLE, or HANNING. [default: HAMMING]"/>
      <Parameter name="OUTPUT_GAIN" type="float" value="1.0" description="Output gain factor. [default: 1.0]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_Synthesize_1" output="OUTPUT" to="node_SaveWavePCM_1" input="INPUT"/>
    <Link from="node_GHDSS_1" output="OUTPUT" to="node_Synthesize_1" input="INPUT"/>
    <NetInput name="SPEC" node="node_GHDSS_1" terminal="INPUT_FRAMES" object_type="Matrix<complex<float> >" description="Input multi-channel spectrum. A row is a channel, and a column is a spectrum for the corresponding channel."/>
    <NetInput name="SOURCE" node="node_GHDSS_1" terminal="INPUT_SOURCES" object_type="Vector&lt;ObjectRef&gt;" description="Source locations with ID. Each element of the vector is a source location with ID specified by "Source"."/>
    <NetOutput name="DUMMY" node="node_SaveWavePCM_1" terminal="OUTPUT" object_type="Map&lt;int,ObjectRef&gt;" description="The same as input."/>
    <NetOutput name="OUTPUT" node="node_GHDSS_1" terminal="OUTPUT" object_type="Map&lt;int,ObjectRef&gt;" description="Separated spectrum with a source ID(key), and the value is the separated spectrum for each sound source (Vector&lt;complex&lt;float&gt; &gt;)."/>
  </Network>
  <Network type="subnet" name="sub_recognition">
    <Node name="node_Delta_1" type="Delta" x="1070" y="130">
      <Parameter name="FBANK_COUNT" type="int" value="41" description="The size of the input feature vector."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_FeatureRemover_1" type="FeatureRemover" x="370" y="310">
      <Parameter name="SELECTOR" type="object" value="<Vector<int> 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81>" description="Component indices in a feature vector to remove. E.g. &lt;Vector&lt;int&gt; 13&gt; to remove 14th comopnent (The index start with 0)."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MSLSExtraction_1" type="MSLSExtraction" x="800" y="130">
      <Parameter name="FBANK_COUNT" type="subnet_param" value="FBANK_COUNT" description="Size of the static part of MSLS feature vector. [default: 13]"/>
      <Parameter name="NORMALIZATION_MODE" type="string" value="SPECTRAL" description="The domain to perform normalization. CEPSTRAL or SPECTRAL. [default: CEPSTRAL]"/>
      <Parameter name="USE_LEGACY_MODE" type="bool" value="true" description="For more than 14 dimensions must use false. This parameter is preparing only for compatibility with HARK 2.x or earlier. [default: true]"/>
      <Parameter name="USE_HTK_LIFTER" type="bool" value="false" description="Use HTK liftering vector if true. [default: false]"/>
      <Parameter name="LIFTERING_COEF" type="int" value="22" description="The HTK liftering coefficient used in Cepstral mode. [default: 22]"/>
      <Parameter name="USE_POWER" type="bool" value="true" description="Use power feature if true. [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_MelFilterBank_1" type="MelFilterBank" x="570" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling rate in Hz.  [default: 16000]"/>
      <Parameter name="CUTOFF" type="int" value="8000" description="Cutoff frequency in Hz. Mel-filterbanks are placed between 0 Hz and CUTOFF Hz. [default: 8000]"/>
      <Parameter name="MIN_FREQUENCY" type="int" value="63" description="Minimum frequency (Hz) [default: 63]"/>
      <Parameter name="MAX_FREQUENCY" type="int" value="8000" description="Maximum frequency (Hz) [default: 8000]"/>
      <Parameter name="FBANK_COUNT" type="subnet_param" value="FBANK_COUNT" description="The number of Mel filter banks. [default: 13]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_PreEmphasis_1" type="PreEmphasis" x="350" y="150">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="window length in sample [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling rate in Hz [default: 16000]"/>
      <Parameter name="PREEMCOEF" type="float" value="0.97" description="pre-emphasis coefficient [default: 0.97]"/>
      <Parameter name="INPUT_TYPE" type="string" value="SPECTRUM" description="The domain to perform pre-emphasis [default: WAV]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SpeechRecognitionClient_1" type="SpeechRecognitionClient" x="1130" y="310">
      <Parameter name="MFM_ENABLED" type="bool" value="false" description="MFM is enbaled if true. [default: true]"/>
      <Parameter name="HOST" type="string" value="127.0.0.1" description="Hostname or IP of Julius/Julian server. [default: 127.0.0.1]"/>
      <Parameter name="PORT" type="int" value="5530" description="Port number of Julius/Julian server. [default: 5530]"/>
      <Parameter name="SOCKET_ENABLED" type="bool" value="true" description="send data via socket if true. [default: true]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SpectralMeanNormalizationIncremental_1" type="SpectralMeanNormalizationIncremental" x="620" y="310">
      <Parameter name="FBANK_COUNT" type="int" value="13" description="Size of a static part of a feature vector. [default: 13]"/>
      <Parameter name="PERIOD" type="int" value="20" description="Period to calculate SM. If 1, SM calculated in every frame. If 0, SM_ALGORITHM is used for all frames. [default: 20]"/>
      <Parameter name="SM_ALGORITHM" type="string" value="INCREMENTAL" description="Algorithm to decide the initial SM until reaching PERIOD + IGNORE_FRAMES. If INCREMENTAL, SM is calculated in every frame. If PREV_SM, SM is taken from the previous source. If ZERO, SM is zeroed. If FILE, SM is loaded from SM_FILENAME."/>
      <Parameter name="SM_FILENAME" type="string" value="" description="csv file name of the initial SM until reaching PERIOD + IGNORE_FRAMES."/>
      <Parameter name="SM_HISTORY_FILENAME" type="string" value="" description="csv file name of the initial spectral history for an initial SM"/>
      <Parameter name="IGNORE_FRAMES" type="int" value="0" description="Number of first ingnored frames to calculate SM [default: 0]"/>
      <Parameter name="BASENAME" type="string" value="smn_" description="Basename of SMN files. The filename will be BASENAME+id+.csv [default: smn_]"/>
      <Parameter name="OUTPUT_FNAME" type="string" value="out.txt" description="No Description Available"/>
      <Parameter name="SM_EXPORT_FILENAME" type="string" value="" description="csv file name of the export SM which follows SM_EXPORT_ALGORITHM. If vacant, the exporting is disabled."/>
      <Parameter name="SM_EXPORT_ALGORITHM" type="string" value="LAST_SRC" description="Algorithm to decide the SM for exporting. If LAST_SRC, SM of the last source is saved. If SRC_AVERAGE, SM of all the sources are averaged and saved. If FRAME_AVERAGE, SM of all the frames in all sources are averaged and saved."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MelFilterBank_1" output="OUTPUT" to="node_MSLSExtraction_1" input="FBANK"/>
    <Link from="node_MSLSExtraction_1" output="OUTPUT" to="node_Delta_1" input="INPUT"/>
    <Link from="node_Delta_1" output="OUTPUT" to="node_FeatureRemover_1" input="INPUT"/>
    <Link from="node_PreEmphasis_1" output="OUTPUT" to="node_MelFilterBank_1" input="INPUT"/>
    <Link from="node_PreEmphasis_1" output="OUTPUT" to="node_MSLSExtraction_1" input="SPECTRUM"/>
    <Link from="node_FeatureRemover_1" output="OUTPUT" to="node_SpectralMeanNormalizationIncremental_1" input="INPUT"/>
    <Link from="node_SpectralMeanNormalizationIncremental_1" output="OUTPUT" to="node_SpeechRecognitionClient_1" input="FEATURES"/>
    <Link from="node_SpectralMeanNormalizationIncremental_1" output="OUTPUT" to="node_SpeechRecognitionClient_1" input="MASKS"/>
    <NetOutput name="ASR-A" node="node_SpeechRecognitionClient_1" terminal="OUTPUT" object_type="Vector<ObjectRef>" description="The same as SOURCES."/>
    <NetInput name="SPEC" node="node_PreEmphasis_1" terminal="INPUT" object_type="Map&lt;int,ObjectRef&gt;" description="ObjectRef is a spectrum (Vector&lt;complex&lt;float&gt; &gt;) or a wave form (Vector&lt;float&gt;)."/>
    <NetInput name="SOURCES" node="node_SpeechRecognitionClient_1" terminal="SOURCES" object_type="Vector&lt;ObjectRef&gt;" description="Source locations with ID. Each element of the vector is a source location with ID specified by "Source"."/>
  </Network>
  <Network type="subnet" name="sub_localization">
    <Node name="node_LocalizeMUSIC_1" type="LocalizeMUSIC" x="320" y="100">
      <Parameter name="MUSIC_ALGORITHM" type="string" value="SEVD" description="Sound Source Localization Algorithm. If SEVD, NOISECM will be ignored"/>
      <Parameter name="TF_CHANNEL_SELECTION" type="object" value="<Vector<int> 0 1 2 3 4 5 6 7>" description="Microphone channels for localization. If vacant, all channels will be used."/>
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="subnet_param" value="SAMPLING_RATE" description="Sampling Rate (Hz)."/>
      <Parameter name="TF_INPUT_TYPE" type="string" value="FILE" description="Load form TF file or Input terminal."/>
      <Parameter name="A_MATRIX" type="string" value="tf.zip" description="Filename of a transfer function matrix."/>
      <Parameter name="WINDOW" type="int" value="50" description="The number of frames used for calculating a correlation function."/>
      <Parameter name="WINDOW_TYPE" type="string" value="MIDDLE" description="Window selection to accumulate a correlation function. If PAST, the past WINDOW frames from the current frame are used for the accumulation. If MIDDLE, the current frame will be the middle of the accumulated frames. If FUTURE, the future WINDOW frames from the current frame are used for the accumulation. FUTURE is the default from version 1.0, but this makes a delay since we have to wait for the future information. PAST generates a internal buffers for the accumulation, which realizes no delay for localization."/>
      <Parameter name="PERIOD" type="int" value="50" description="The period in which the source localization is processed."/>
      <Parameter name="NUM_SOURCE" type="int" value="2" description="Number of sources, which should be less than number of channels."/>
      <Parameter name="MIN_DEG" type="int" value="-180" description="source direction (lower)."/>
      <Parameter name="MAX_DEG" type="int" value="180" description="source direction (higher)."/>
      <Parameter name="LOWER_BOUND_FREQUENCY" type="int" value="3000" description="Lower bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="UPPER_BOUND_FREQUENCY" type="int" value="6000" description="Upper bound of frequency (Hz) used for correlation function calculation."/>
      <Parameter name="SPECTRUM_WEIGHT_TYPE" type="string" value="A_Characteristic" description="MUSIC spectrum weight for each frequency bin."/>
      <Parameter name="A_CHAR_SCALING" type="float" value="1.0" description="Scaling factor of the A-Weight with respect to frequency"/>
      <Parameter name="MANUAL_WEIGHT_SPLINE" type="object" value="<Matrix<float> <rows 2> <cols 5> <data 0.0 2000.0 4000.0 6000.0 8000.0 1.0 1.0 1.0 1.0 1.0> >" description="MUSIC spectrum weight for each frequency bin. This is a 2 by M matrix. The first row represents the frequency, and the second row represents the weight gain. "M" represents the number of key points for the spectrum weight. The frequency range between M key points will be interpolated by spline manner. The format is "&lt;Matrix&lt;float&gt; &lt;rows 2&gt; &lt;cols 2&gt; &lt;data 1 2 3 4&gt; &gt;"."/>
      <Parameter name="MANUAL_WEIGHT_SQUARE" type="object" value="<Vector<float> 0.0 2000.0 4000.0 6000.0 8000.0>" description="MUSIC spectrum weight for each frequency bin. This is a M order vector. The element represents the frequency points for the square wave. "M" represents the number of key points for the square wave weight. The format is "&lt;Vector&lt;float&gt; 1 2 3 4&gt;"."/>
      <Parameter name="ENABLE_EIGENVALUE_WEIGHT" type="bool" value="false" description="If true, the spatial spectrum is weighted depending on the eigenvalues of a correlation matrix. We do not suggest to use this function with GEVD and GSVD, because the NOISECM changes the eigenvalue drastically. Only useful for SEVD."/>
      <Parameter name="MAXNUM_OUT_PEAKS" type="int" value="-1" description="Maximum number of output peaks. If MAXNUM_OUT_PEAKS = NUM_SOURCE, this is compatible with HARK version 1.0. If MAXNUM_OUT_PEAKS = 0, all local maxima are output. If MAXNUM_OUT_PEAKS &lt; 0, MAXNUM_OUT_PEAKS is set to NUM_SOURCE. If MAXNUM_OUT_PEAKS &gt; 0, number of output peaks is limited to MAXNUM_OUT_PEAKS."/>
      <Parameter name="DEBUG" type="bool" value="true" description="Debug option. If the parameter is true, this node outputs sound localization results to a standard output."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceTracker_1" type="SourceTracker" x="540" y="100">
      <Parameter name="THRESH" type="float" value="25" description="Power threshold for localization results. A localization result with higher power than THRESH is tracked, otherwise ignored."/>
      <Parameter name="PAUSE_LENGTH" type="float" value="1200" description="Life duration of source in ms. When any localization result for a source is found for more than PAUSE_LENGTH / 10 iterations, the source is terminated. [default: 800]"/>
      <Parameter name="MIN_SRC_INTERVAL" type="float" value="20" description="Source interval threshold in degree. When the angle between a localization result and a source is smaller than MIN_SRC_INTERVAL, the same ID is given to the localization result. [default: 20]"/>
      <Parameter name="MIN_ID" type="int" value="0" description="Minimum ID of source locations. MIN_ID should be greater than 0 or equal."/>
      <Parameter name="DEBUG" type="bool" value="false" description="Output debug information if true [default: false]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_SourceIntervalExtender_1" type="SourceIntervalExtender" x="770" y="100">
      <Parameter name="PREROLL_LENGTH" type="int" value="80" description="Preroll length in frame. [default: 50]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_plotQuickSourceKivy_1" type="plotQuickSourceKivy" x="1060" y="100">
    </Node>
    <Node name="node_CMIdentityMatrix_1" type="CMIdentityMatrix" x="50" y="190">
      <Parameter name="NB_CHANNELS" type="int" value="8" description="The number of input channels."/>
      <Parameter name="LENGTH" type="int" value="512" description="The length of a frame (per channel)."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_LocalizeMUSIC_1" output="OUTPUT" to="node_SourceTracker_1" input="INPUT"/>
    <Link from="node_SourceTracker_1" output="OUTPUT" to="node_SourceIntervalExtender_1" input="SOURCES"/>
    <Link from="node_SourceIntervalExtender_1" output="OUTPUT" to="node_plotQuickSourceKivy_1" input="SOURCES"/>
    <Link from="node_CMIdentityMatrix_1" output="OUTPUT" to="node_LocalizeMUSIC_1" input="NOISECM"/>
    <NetOutput name="OUTPUT" node="node_plotQuickSourceKivy_1" terminal="OUTPUT" object_type="any" description=""/>
    <NetInput name="INPUT" node="node_LocalizeMUSIC_1" terminal="INPUT" object_type="Matrix&lt;complex&lt;float&gt; &gt;" description="Multi-channel audio signals. In this matrix, a row is a channel, and a column is a sample."/>
  </Network>
  <Network type="iterator" name="LOOP">
    <Node name="node_MultiFFT_1" type="MultiFFT" x="630" y="100">
      <Parameter name="LENGTH" type="subnet_param" value="LENGTH" description="FFT length in sample. [default: 512]"/>
      <Parameter name="WINDOW" type="string" value="CONJ" description="A window function for FFT. WINDOW should be CONJ, HAMMING, RECTANGLE, or HANNING. [default: CONJ]"/>
      <Parameter name="WINDOW_LENGTH" type="subnet_param" value="LENGTH" description="Window length of the window function. [default: 512]"/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Node name="node_sub_separation_1" type="sub_separation" x="1090" y="100">
    </Node>
    <Node name="node_sub_recognition_1" type="sub_recognition" x="1330" y="170">
      <Parameter name="FBANK_COUNT" type="int" value="40" description="The size of the input feature vector."/>
      <Parameter name="LENGTH" type="int" value="512" description="Size of window length in sample. [default: 512]"/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling rate in Hz.  [default: 16000]"/>
    </Node>
    <Node name="node_sub_localization_1" type="sub_localization" x="840" y="190">
      <Parameter name="LENGTH" type="int" value="512" description="The length of a frame (per channel)."/>
      <Parameter name="SAMPLING_RATE" type="int" value="16000" description="Sampling Rate (Hz)."/>
    </Node>
    <Node name="node_AudioStreamFromWave_1" type="AudioStreamFromWave" x="300" y="100">
      <Parameter name="LENGTH" type="int" value="512" description="The frame length of each channel (in samples) [default: 512]."/>
      <Parameter name="ADVANCE" type="int" value="160" description="The shift length beween adjacent frames (in samples)[default: 160]."/>
      <Parameter name="USE_WAIT" type="bool" value="false" description="If true, real recording is simulated [default: false]."/>
      <Parameter name="HARKMW.PROCESS" type="string" value="local" description="Specify a name that indicates the machine to be executed. [default: local]"/>
      <Parameter name="HARKMW.TOPIC" type="string" value="local" description="Specifies the prefix of the topic name used for data transmission and reception by MQTT. [default: local] In actual transmission, '/<terminal name>' is added to the value specified here. e.g.) 'local/VALUE'"/>
    </Node>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_separation_1" input="SPEC"/>
    <Link from="node_sub_localization_1" output="OUTPUT" to="node_sub_separation_1" input="SOURCE"/>
    <Link from="node_sub_separation_1" output="OUTPUT" to="node_sub_recognition_1" input="SPEC"/>
    <Link from="node_sub_localization_1" output="OUTPUT" to="node_sub_recognition_1" input="SOURCES"/>
    <Link from="node_MultiFFT_1" output="OUTPUT" to="node_sub_localization_1" input="INPUT"/>
    <Link from="node_AudioStreamFromWave_1" output="AUDIO" to="node_MultiFFT_1" input="INPUT"/>
    <NetInput name="INPUT" node="node_AudioStreamFromWave_1" terminal="INPUT" object_type="Stream" description="An audio input stream (IStream)."/>
    <NetOutput name="ASR-A" node="node_sub_recognition_1" terminal="ASR-A" object_type="any" description="Dynamic"/>
    <NetOutput name="DUMMY" node="node_sub_separation_1" terminal="DUMMY" object_type="Map&lt;int,ObjectRef&gt;" description="The same as input."/>
    <NetCondition name="CONDITION" node="node_AudioStreamFromWave_1" terminal="NOT_EOF"/>
  </Network>
</Document>

まずマイク入力のチェックを行う
PyHARK によるPythonプログラム： b>ractice3-2.py の解説

Practice3-2:　PyHARKによるマイクのチェックとオンライン音源定位の稼働

ターミナルを開いてオーディオデバイスを確認

$ cd practice3/data
$ python practice3-2.py -l
    0 Ensoniq AudioPCI: ES1371 DAC2/ADC (hw:0,0), ALSA (2 in, 0 out)
    1 Ensoniq AudioPCI: ES1371 DAC1 (hw:0,1), ALSA (0 in, 2 out)
    2 TAMAGO-XX: USB Audio (hw:1,0), ALSA (8 in, 0 out)
    3 sysdefault, ALSA (128 in, 0 out)
    4 samplerate, ALSA (128 in, 0 out)
    5 speexrate, ALSA (128 in, 0 out)
    6 pulse, ALSA (32 in, 32 out)
    7 upmix, ALSA (8 in, 0 out)
    8 vdownmix, ALSA (6 in, 0 out)
  * 9 default, ALSA (32 in, 32 out)

使いたいデバイス名か番号を覚えておく
プログラムを実行する ("-d 2" でデバイス番号かデバイス名を指定)

$ python practice3-2.py -d 2

マイルアレイに向かって発声

定位結果が表示される
マイクの向きを変えると定位結果はどう変わる？
Ctrl-C で修了

動かない場合の対処法

Ctrl-Z で一時停止，kill %1 などでジョブ停止
マイクアレイを抜いて，差し直す

マイク入力オンライン音源定位プログラム `practice3-2.py` の詳細

Practice3-2: `practice3-2.py` の全体構造

全体の基本構造はファイル入力版 practice3-1.py と同じ practice3-2 のMAIN

#!/usr/bin/env python
import hark						# 必要なモジュールを import
import argparse	# コマンドライン引数処理
import sounddevice as sd # マイクアレイで録音

class HARK_Localization(hark.NetworkDef):		# 音源定位サブネットワーク
    def build(self, network, input, output):
        ......

class HARK_Main_Loop(hark.NetworkDef):			# 波形データを受け取りFFT実行
	def build(self, network, input, output):

	　# 入力：8ch音響信号 ⇒ フーリエ変換、MUSIC法による音源定位、音源追跡 ⇒ 結果の表示
class HARK_Main(hark.NetworkDef):			# メインネットワーク
    def build(self, network, input, output):
        ......

def main():						# main関数
    ...... 
	
    if __name__ == '__main__':
    main()

practice3-2.pyは次の通り：

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import threading
import time
import argparse
import tempfile

import numpy as np
import hark

import sounddevice as sd
import soundfile as sf

# import plotQuickWaveformKivy
# import plotQuickSpecKivy
# import plotQuickMusicSpecKivy
import plotQuickSourceKivy


class HARK_Localization(hark.NetworkDef):
    def build(self,
              network: hark.Network,
              input:   hark.DataSourceMap,
              output:  hark.DataSinkMap):

        # 必要なノードを作成する
        try:
            node_cm_identity_matrix = network.create(
                hark.node.CMIdentityMatrix,
                dispatch=hark.RepeatDispatcher
            )
            node_constant__for_operation_flag = network.create(
                hark.node.Constant,
                dispatch=hark.RepeatDispatcher
            )
            node_localize_music = network.create(hark.node.LocalizeMUSIC)
            node_source_tracker = network.create(hark.node.SourceTracker)
            node_source_interval_extender = network.create(hark.node.SourceIntervalExtender)
            node_plotsource_kivy = network.create(plotQuickSourceKivy.plotQuickSourceKivy)

        except BaseException as ex:
            print(ex)

        # ノード間の接続（データの流れ）とパラメータを記述し，
        # ネットワークに含まれるノードの一覧を含むリストを作成する
        try:
            r = [
                node_cm_identity_matrix
                    .add_input("NB_CHANNELS", 8)
                    .add_input("LENGTH", 512)
                ,
                node_constant__for_operation_flag
                    .add_input("VALUE", True)
                ,
                node_localize_music
                    .add_input("INPUT", input["SPEC"])
                    .add_input("NOISECM", node_cm_identity_matrix["OUTPUT"])
                    .add_input("OPERATION_FLAG", node_constant__for_operation_flag["OUTPUT"])
                    .add_input("MUSIC_ALGORITHM", "SEVD")
                    .add_input("A_MATRIX", "tf.zip")
                    .add_input("WINDOW_TYPE", "MIDDLE")
                    .add_input("LOWER_BOUND_FREQUENCY", 3000)
                    .add_input("UPPER_BOUND_FREQUENCY", 6000)
                    .add_input("SPECTRUM_WEIGHT_TYPE", "A_Characteristic")
                    .add_input("ENABLE_EIGENVALUE_WEIGHT", False)
                    .add_input("ENABLE_OUTPUT_SPECTRUM", True)
                ,
                node_source_tracker
                    .add_input("INPUT", node_localize_music["OUTPUT"])
                    .add_input("THRESH", 25.0)
                    .add_input("PAUSE_LENGTH", 1200.0)
                    #.add_input("MIN_SRC_INTERVAL", 20.0)
                ,
                node_source_interval_extender
                    .add_input("SOURCES", node_source_tracker["OUTPUT"])
                    .add_input("PREROLL_LENGTH", 80)
                ,
                node_plotsource_kivy
                    .add_input("SOURCES", node_source_interval_extender["OUTPUT"])
                ,
            ]

            output.add_input("OUTPUT", node_source_interval_extender["OUTPUT"])

        except BaseException as ex:
            print('error: {}'.format(ex))

        # ノード一覧のリストを返す
        return r


class HARK_Main_Loop(hark.NetworkDef):
    def build(self,
              network: hark.Network,
              input:   hark.DataSourceMap,
              output:  hark.DataSinkMap):

        # 必要なノードを作成する
        try:
            node_audio_stream_from_memory = network.create(
                hark.node.AudioStreamFromMemory,
                dispatch=hark.TriggeredMultiShotDispatcher,
                name="AudioStreamFromMemory"
            )
            node_multi_fft = network.create(hark.node.MultiFFT)
            node_sub_localization = network.create(
                HARK_Localization,
                name="Localization"
            )

        except BaseException as ex:
            print(ex)

        # ノード間の接続（データの流れ）とパラメータを記述し，
        # ネットワークに含まれるノードの一覧を含むリストを作成する
        try:
            r = [
                node_audio_stream_from_memory
                    .add_input("INPUT", input["INPUT"])
                    .add_input("CHANNEL_COUNT", 8)
                ,
                node_multi_fft
                    .add_input("INPUT", node_audio_stream_from_memory["AUDIO"])
                ,
                node_sub_localization
                    .add_input("SPEC", node_multi_fft["OUTPUT"])
                ,
            ]

            output.add_input("OUTPUT", node_sub_localization["OUTPUT"])

        except BaseException as ex:
            print('error: {}'.format(ex))

        # ノード一覧のリストを返す
        return r


class HARK_Main(hark.NetworkDef):
    '''メインネットワークに相当するクラス。
    入力として8ch音響信号を受け取り、
    フーリエ変換、MUSIC法による音源定位、音源追跡を行い、
    その結果を図示する。
    '''

    def build(self,
              network: hark.Network,
              input:   hark.DataSourceMap,
              output:  hark.DataSinkMap):

        # 必要なノードを作成する．
        # - 全体の入出力を扱う Publisher と Subscriber
        # - HARK_Main_Loop サブネット
        try:
            node_publisher = network.create(
                hark.node.PublishData,
                dispatch=hark.RepeatDispatcher,
                name="Publisher"
            )
            node_subscriber = network.create(
                hark.node.SubscribeData,
                name="Subscriber"
            )
            loop = network.create(
                HARK_Main_Loop,
                name="HARK_Main_Loop"
            )
        except BaseException as ex:
            print(ex)

        # ノード間の接続（データの流れ）とパラメータを記述し，
        # ネットワークに含まれるノードの一覧を含むリストを作成する
        try:
            r = [
                loop
                    .add_input("INPUT", node_publisher["OUTPUT"])
                ,
                node_subscriber
                    .add_input("INPUT", loop["OUTPUT"])
                ,
            ]
        except BaseException as ex:
            print(ex)

        # ノード一覧のリストを返す
        return r


def main():

    def int_or_str(text):
        """Helper function for argument parsing."""
        try:
            return int(text)
        except ValueError:
            return text

    parser = argparse.ArgumentParser(add_help=False)
    parser.add_argument(
        '-l', '--list-devices', action='store_true',
        help='show list of audio devices and exit')
    args, remaining = parser.parse_known_args()
    if args.list_devices:
        print(sd.query_devices())
        parser.exit(0)
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        parents=[parser])
    parser.add_argument(
        'filename', nargs='?', metavar='FILENAME',
        help='audio file to store recording to')
    parser.add_argument(
        '-d', '--device', type=int_or_str,
        help='input device (numeric ID or substring)')
    parser.add_argument(
        '-r', '--samplerate', type=int, default=16000, help='sampling rate')
    parser.add_argument(
        '-c', '--channels', type=int, default=8, help='number of input channels')
    parser.add_argument(
        '-t', '--subtype', type=str, help='sound file subtype (e.g. "PCM_24")')
    args = parser.parse_args(remaining)

    if args.samplerate is None:
        device_info = sd.query_devices(args.device, 'input')
        # soundfile expects an int, sounddevice provides a float:
        args.samplerate = int(device_info['default_samplerate'])
    if args.channels is None:
        device_info = sd.query_devices(args.device, 'input')
        args.channels = device_info['max_input_channels']
    if args.filename is None:
        args.filename = tempfile.mktemp(prefix='practice3-2_',
                                        suffix='.wav', dir='')

    # メインネットワークを構築
    network = hark.Network.from_networkdef(HARK_Main, name="HARK_Main")

    # メインネットワークへの入出力を構築
    publisher = network.query_nodedef("Publisher")
    subscriber = network.query_nodedef("Subscriber")

    # subscriber がデータを受け取ったとき
    # （メインネットワークが結果を出力したとき）に
    # 実行される動作を定義する。
    # ここでは pass を用いることで「何もしない」ことを指示する。
    def received(data):
        pass

    subscriber.receive = received

    def callback(indata, frames, time, status):
        # print(indata.shape, time.currentTime)
        publisher.push(indata.T)

    try:
        # ネットワーク実行用スレッドを開始
        th = threading.Thread(target=network.execute)
        th.start()

        with sd.InputStream(samplerate=args.samplerate, blocksize=160,
                            device=args.device, dtype=np.int16,
                            channels=args.channels, callback=callback) as stream:
            print('#' * 75)
            print('press Ctrl+C to stop the recording')
            print('#' * 75)
            th.join()

    except BaseException as ex:
        print(ex)
    except:
        network.stop()
    finally:
        # 終了処理
        publisher.close()
        if th.ident is not None:
            th.join()


if __name__ == '__main__':
    main()

# end of file

Practice3-2:　`sub_Localization` 関数

class HARK_Localization(hark.NetworkDef):
    def build(self,
	     network: hark.Network,
	     input:   hark.DataSourceMap,
	     output:  hark.DataSinkMap):

        # 必要なノードを作成する
        try:
            node_cm_identity_matrix = network.create(
                hark.node.CMIdentityMatrix,
                dispatch=hark.RepeatDispatcher
            )
            node_constant__for_operation_flag = network.create(
                hark.node.Constant,
                dispatch=hark.RepeatDispatcher
            )
            node_localize_music = network.create(hark.node.LocalizeMUSIC)
            node_source_tracker = network.create(hark.node.SourceTracker)
            node_source_interval_extender = network.create(hark.node.SourceIntervalExtender)
            node_plotsource_kivy = network.create(plotQuickSourceKivy.plotQuickSourceKivy)

        except BaseException as ex:
            print(ex)

        # ノード間の接続（データの流れ）とパラメータを記述し，
        # ネットワークに含まれるノードの一覧を含むリストを作成する
        try:
            r = [
                node_cm_identity_matrix
                    .add_input("NB_CHANNELS", 8)
                    .add_input("LENGTH", 512)
                ,
                node_constant__for_operation_flag
                    .add_input("VALUE", True)
                ,
                node_localize_music
                    .add_input("INPUT", input["SPEC"])
                    .add_input("NOISECM", node_cm_identity_matrix["OUTPUT"])
                    .add_input("OPERATION_FLAG", node_constant__for_operation_flag["OUTPUT"])
                    .add_input("MUSIC_ALGORITHM", "SEVD")
                    .add_input("A_MATRIX", "tf.zip")
                    .add_input("WINDOW_TYPE", "MIDDLE")
                    .add_input("LOWER_BOUND_FREQUENCY", 3000)
                    .add_input("UPPER_BOUND_FREQUENCY", 6000)
                    .add_input("SPECTRUM_WEIGHT_TYPE", "A_Characteristic")
                    .add_input("ENABLE_EIGENVALUE_WEIGHT", False)
                    .add_input("ENABLE_OUTPUT_SPECTRUM", True)
                ,
                node_source_tracker
                    .add_input("INPUT", node_localize_music["OUTPUT"])
                    .add_input("THRESH", 25.0)
                    .add_input("PAUSE_LENGTH", 1200.0)
                    #.add_input("MIN_SRC_INTERVAL", 20.0)
                ,
                node_source_interval_extender
                    .add_input("SOURCES", node_source_tracker["OUTPUT"])
                    .add_input("PREROLL_LENGTH", 80)
                ,
                node_plotsource_kivy
                    .add_input("SOURCES", node_source_interval_extender["OUTPUT"])
                ,
            ]

            output.add_input("OUTPUT", node_source_interval_extender["OUTPUT"])

        except BaseException as ex:
            print('error: {}'.format(ex))

        # ノード一覧のリストを返す
        return r

Practice3-2:　`main` 関数

main 関数の構造

def main():
    parser = argparse.ArgumentParser(add_help=False)	# コマンドライン引数処理
    parser.add_argument(
        '-l', '--list-devices', action='store_true',	# -l が指定：デバイス一覧を出力して終了
        help='show list of audio devices and exit')
    args, remaining = parser.parse_known_args()
    if args.list_devices:
        print(sd.query_devices())
        parser.exit(0)
	
    parser = argparse.ArgumentParser(
        description=__doc__,							
        formatter_class=argparse.RawDescriptionHelpFormatter,
	        parents=[parser])
    parser.add_argument(
        '-d', '--device', type=int_or_str,		
        help='input device (numeric ID or substring)')		# -d が指定
    parser.add_argument(	
        '-r', '--samplerate', type=int, help='sampling rate')	# -r が指定
    parser.add_argument(					# -c が指定
        '-c', '--channels', type=int, default=1, help='number of input channels')
    args = parser.parse_args(remaining)

main 関数でMAINネットワークの作成・データ受信時の動作定義

def main():
   ......
    network = hark.Network.from_networkdef(		# メインネットワークを作成
        HARK_Main, name="HARK_Main")
    publisher = network.query_nodedef("Publisher")
    subscriber = network.query_nodedef("Subscriber")

    def received(data):		# subscriber がデータをたときに実行される動作を定義
        pass
    subscriber.receive = received

    th = threading.Thread(target=network.execute)	# 新たなスレッドを作成しネットワーク実行
    th.start()
    ......

main 関数の publisher と返り値の記述

def main():
    ......
		# 録音時のコールバック関数を定義. 得られた配列を publisher に渡す
    def callback(indata, frames, time, status):
        publisher.push(np.copy(indata.T))
			# HARKに適したデータの配列 [nch, samples] にするため行列を転置

    try:
        with sd.InputStream(samplerate=args.samplerate,
                            device=args.device, dtype=np.int16,
                            channels=args.channels,
                            callback=callback) as stream:
            print('press Ctrl+C to stop the recording')
            th.join()		# オーディオデバイスで録音開始
    ......

`practice3-2.py`　の音声認識

HARK Designer による音声認識（Practice2)

PyHARKによるPythonプログラムに変換（practice3-2.py）

import hark

class HARK_Localization(hark.NetworkDef):
    def build(self, network, input, output):
			......

class HARK_Separation(hark.NetworkDef):
	def build(self, network, input, output):
			......
class HARK_Recognition(hark.NetworkDef):
    def build(self, network, input, output):
			......
class HARK_Main(hark.NetworkDef):
    def build(self, network, input, output):
        node_publisher = network.create(hark.node.PublishData, dispatch=hark.RepeatDispatcher, name="Publisher")
        node_subscriber = network.create(hark.node.SubscribeData, name="Subscriber")

        node_audio_stream_from_memory = network.create(hark.node.AudioStreamFromMemory, dispatch=hark.TriggeredMultiShotDispatcher)
        node_multi_fft = network.create(hark.node.MultiFFT)
        node_localization = network.create(HARK_Localization, name="HARK_Localization")
        node_separation = network.create(HARK_Separation, name="HARK_Separation")
        node_recognition = network.create(HARK_Separation, name="HARK_Recognition")
	
        node_audio_stream_from_memory.add_input("INPUT", node_publisher["OUTPUT"]).add_input("CHANNEL_COUNT", 8)
        node_multi_fft.add_input("INPUT", node_audio_stream_from_memory["AUDIO"])
        node_localization.add_input("INPUT", node_multi_fft["OUTPUT"])
        node_separation.add_input("SPEC", node_multi_fft["OUTPUT"]).add_input("SRC_INFO", node_localization["OUTPUT"])
        node_recognition.add_input("SPEC", node_separation[“SPEC"]).add_input("SRC_INFO", node_localization["OUTPUT"])
        node_subscriber.add_input("INPUT", node_separation["OUTPUT"])

        r = [node_publisher, node_subscriber, node_audio_stream_from_memory, node_multi_fft, node_localization, node_separation, node_recognition]
        return r
        ......

practice3-2r.py 実行前準備　音声認識エンジンの起動

Kaldidecoderを起動

cd practice2/data
sh 1_run_ASR.sh 
	[INFO] 
	This software includes work that is distributed under the Apache License 2.0 .
	......
	[INFO] Removed 1 orphan nodes.
	[INFO] Removing 2 orphan components.
	[INFO] Added 1 components, removed 2
	[INFO] Spent 0.1125 seconds in looped compilation.
	[INFO] Removed 1 orphan nodes.
	[INFO] Removing 2 orphan components.
	[INFO] Added 1 components, removed 2
	[INFO] Spent 0.107063 seconds in looped compilation.

起動処理が終わり表示が落ち着くまで待つ

practice3-2r.py 実行前準備　オーディオデバイスの確認

ターミナルを開きオーディオデバイスを確認

cd practice3/data
python practice3-2r.py -l
	0 Ensoniq AudioPCI: ES1371 DAC2/ADC (hw:0,0), ALSA (2 in, 0 out)
	1 Ensoniq AudioPCI: ES1371 DAC1 (hw:0,1), ALSA (0 in, 2 out)
	2 TAMAGO-XX: USB Audio (hw:1,0), ALSA (8 in, 0 out)
	3 sysdefault, ALSA (128 in, 0 out)
	4 samplerate, ALSA (128 in, 0 out)
	5 speexrate, ALSA (128 in, 0 out)
	6 pulse, ALSA (32 in, 32 out)
	7 upmix, ALSA (8 in, 0 out)
	8 vdownmix, ALSA (6 in, 0 out)
      * 9 default, ALSA (32 in, 32 out)

使いたいデバイス名か番号を覚えておく（例えば，TAMAGO-XXX あるいは2）

Practice3-2:　Pythonで　practice3-2r.py を動かす

入力デバイスを指定してプログラムを実行（-d デバイス番号 or デバイス名）
```
python practice3-2r.py -d 2 
```
マイクにむかって発話

定位・分離・認識が実行される　　

[INFO] 5.77266 sec, RT=1.36793
source_id = 7, azimuth = 79.982063, elevation = 36.869900, sec = 1668056389, usec = 904003
### Recognition: 2nd pass (RL heuristic best-first) 
STAT: 00 
sentence1: 一 週間 ばかり ニューヨーク を 取材 し た 
wseq1: 一+名詞/数詞 週間+接尾辞 ばかり+助詞/副助詞 ニューヨーク+名詞/固有名詞 を+助詞/格助詞 取材+名詞 し+動詞/サ行変格/連用形 た+助動詞/連体形 
......

Ctrl-C で終了

動かない場合の対処法

Ctrl-Z で一時停止，kill %1 などでジョブ停止
マイクアレイを抜いて，差し直す

Practice3-2　PyHARK　オンライン音源定位・分離・音声認識プログラム

practice3-2r.py のプログラムの構造

#!/usr/bin/env python
import hark					# 必要なモジュールを import
import ...
	
class HARK_Localization(hark.NetworkDef):	# 音源定位サブネットワーク
	......
class HARK_Separation(hark.NetworkDef):		# 音源分離サブネットワーク
		......
class HARK_Recognition(hark.NetworkDef):	# 音声認識サブネットワーク
		......
class HARK_Main(hark.NetworkDef):		# メインネットワーク
		......
def main():					main関数
	...... 
	if __name__ == '__main__':
	    main()

音源定位サブネットワーク（既に説明済み）
音源分離サブネットワーク
音声認識サブネットワーク
メインネットワーク

Practice3-2:　PyHARK　音源分離サブネットワーク

HARK Designer における音源分離サブネットワーク

PyHARKを用いて，次の方針で Pythonプログラムに変換

すべてのネットワークは，hark.NetworkDef を継承したネットワークをクラスとして定義
build() メソッド中にネットワークの構造を記述
必要なノードをnetwork.create で作成
ノードの接続関係は，入力端子に対して node_xxx.add_input("NAME", node_yyy["NAME"]) で定義
ノードのパラメータをnode_xxx.add_input("PARM", VALUE)で定義

クラス HARK_Separation のPythonプログラム

class HARK_Separation(hark.NetworkDef):	# 音源分離サブネットワーク
    def build(self,			build() メソッド中にネットワークの構造を記述
              network: hark.Network
              input:   hark.DataSourceMap,
              output:  hark.DataSinkMap):

	# 必要なノードを作成する
	try:
		node_ghdss = network.create(hark.node.GHDSS)		    # GHDSS 
		node_synthesize = network.create(hark.node.Synthesize)	    # Synthesize 
		node_save_wave_pcm = network.create(hark.node.SaveWavePCM)  # SaveWavePCM 

	except BaseException as ex:
		print(ex)

	# ノード間の接続（データの流れ）とパラメータを記述し，
	# ネットワークに含まれるノードの一覧を含むリストを作成する
	try:
	    r = [
		node_ghdss			# GHDSS ノードの接続とパラメータの設定
		    .add_input("INPUT_FRAMES", input["SPEC"])
		    .add_input("INPUT_SOURCES", input["SRC_INFO"])
		    .add_input("TF_INPUT_TYPE", "FILE")
		    .add_input("TF_CONJ_FILENAME", "tf.zip")
		    .add_input("LC_CONST", "DIAG")
		    .add_input("UPDATE_METHOD_W", "ID")
		,
		node_synthesize			# Synthesize ノードの接続
		    .add_input("INPUT", node_ghdss["OUTPUT"])
		,
		node_save_wave_pcm		# SaveWavePCM ノードの接続
		    .add_input("INPUT", node_synthesize["OUTPUT"])
		    .add_input("BASENAME", "ghdss_{srcid}_")
		,
	]

	output.add_input("OUTPUT", node_ghdss["OUTPUT"])

	except BaseException as ex:
		print('error: {}'.format(ex))

	# ノード一覧のリストを返す
	return r

Practice3-2　Python　音声認識ネットワーク

practice3-2r.py のプログラムの構造

#!/usr/bin/env python
import hark					# 必要なモジュールを import
import ...
	
class HARK_Localization(hark.NetworkDef):	# 音源定位サブネットワーク
	......
class HARK_Separation(hark.NetworkDef):		# 音源分離サブネットワーク
		......
class HARK_Recognition(hark.NetworkDef):	# 音声認識サブネットワーク
		......
class HARK_Main(hark.NetworkDef):		# メインネットワーク
		......
def main():					main関数
	...... 
	if __name__ == '__main__':
	    main()

音源定位サブネットワーク（既に説明済み）
音源分離サブネットワーク（たった今説明）
音源認識サブネットワーク
メインネットワーク

Practice3-2:　PyHARK　音声認識サブネットワーク

HARK Designer における音声認識サブネットワーク

PyHARKを用いて，次の方針で Pythonプログラムに変換

すべてのネットワークは，hark.NetworkDef を継承したネットワークをクラスとして定義
build() メソッド中にネットワークの構造を記述
必要なノードをnetwork.create で作成
ノノードの接続関係は，入力端子に対して node_xxx.add_input("NAME", node_yyy["NAME"]) で定義
ノードのパラメータをnode_xxx.add_input("PARM", VALUE)で定義

クラス HARK_Recognition　のPythonプログラム

class HARK_Recognition(hark.NetworkDef):	# 音声認識サブネットワーク
    def build(self,
              network: hark.Network,
              input:   hark.DataSourceMap,
              output:  hark.DataSinkMap):

        # 必要なノードを作成する
        try:
            node_pre_emphasis = network.create(hark.node.PreEmphasis)	   # PreEmphasis
            node_mel_filter_bank = network.create(hark.node.MelFilterBank) # MelFilterBank
            node_msls_extraction = network.create(hark.node.MSLSExtraction)# MSLSExtraction
            node_delta = network.create(hark.node.Delta)		   # Delta
            node_feature_remover = network.create(hark.node.FeatureRemover)# FeatureRemover
            node_spectral_mean_normalization_incremental = network.create(
					hark.node.SpectralMeanNormalizationIncremental)
						# SpectralMeanNormalizationIncremental
            node_speech_recognition_client = 			# SpeechRecognitionClient
				network.create(hark.node.SpeechRecognitionClient)	
            node_save_features = network.create(hark.node.SaveHTKFeatures) # SaveHTKFeatures

        except BaseException as ex:
            print(ex)

        # ノード間の接続（データの流れ）とパラメータを記述し，
        # ネットワークに含まれるノードの一覧を含むリストを作成する
        try:
            r = [
                node_pre_emphasis	# EMPHASIS ノードの接続
                    .add_input("INPUT", input["SPEC"])
                    .add_input("INPUT_TYPE", "SPECTRUM")
                ,
                node_mel_filter_bank	# MelFilterBank ノードの接続
                    .add_input("INPUT", node_pre_emphasis["OUTPUT"])
                    .add_input("FBANK_COUNT", 40)
                ,
                node_msls_extraction	# MSLSExtractionノードの接続とパラメータの設定
                    .add_input("FBANK", node_mel_filter_bank["OUTPUT"])
                    .add_input("SPECTRUM", node_pre_emphasis["OUTPUT"])
                    .add_input("FBANK_COUNT", 40)
                    .add_input("NORMALIZATION_MODE", "SPECTRAL")
                    .add_input("USE_POWER", True)
                ,
                node_delta		# Delta ノードの接続
                    .add_input("INPUT", node_msls_extraction["OUTPUT"])
                ,
                node_feature_remover	# FeatureRemoverノードの接続とパラメータの設定
                    .add_input("INPUT", node_delta["OUTPUT"])
                    .add_input("SELECTOR", " ".join(map(str, range(40, 81+1))))
                ,
			# SpectralMeanNormalizationIncrementalノードの接続とパラメータの設定
                node_spectral_mean_normalization_incremental　
                    .add_input("INPUT", node_feature_remover["OUTPUT"])
                    .add_input("SM_HISTORY", False) # ToDo
                    .add_input("NOT_EOF", True) # ToDo
                    .add_input("FBANK_COUNT", 40)
                    .add_input("PERIOD", 1)
                    .add_input("BASENAME", "")
                    .add_input("OUTPUT_FNAME", "")
                ,
                node_speech_recognition_client	# FeatureRemoverノードの接続とパラメータの設定
                    .add_input("FEATURES", node_spectral_mean_normalization_incremental["OUTPUT"])
                    .add_input("MASKS", node_spectral_mean_normalization_incremental["OUTPUT"])
                    .add_input("SOURCES", input["SOURCE"])
                    #.add_input("MFM_ENABLED", False)
                    .add_input("HOST", "localhost")
                    #.add_input("PORT", 5530)
                    .add_input("SOCKET_ENABLED", True)
                ,
                node_save_features		# SaveFeatureノードの接続とパラメータの設定
                    .add_input("FEATURES", node_spectral_mean_normalization_incremental["OUTPUT"])
                    .add_input("SOURCES", input["SOURCE"])
                    .add_input("BASENAME", "feature_")
                ,
            ]

            output.add_input("OUTPUT", node_spectral_mean_normalization_incremental["OUTPUT"])

        except BaseException as ex:
            print('error: {}'.format(ex))

        # ノード一覧のリストを返す
        return r
		

		ここまでできた％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％％

Practice3-3:　PyHARK オフライン音源定位・分離・認識システム

Practice3-2:　Python プログラム（全体構造）

Practice3-2:　Python main関数

main関数の構造

main関数の入力の定義

main関数のpubliserhと返り値の記述

Practice3-2:　PyHARKによる音声認識（Practice2のPyHARK化）

Practice3-2:　音声認識エンジンの起動

Kaldidecoderを起動

cd practice2/data
sh 1_run_ASR.sh ⏎
	[INFO] 
	This software includes work that is distributed under the Apache License 2.0 .
	......
	[INFO] Removed 1 orphan nodes.
	[INFO] Removing 2 orphan components.
	[INFO] Added 1 components, removed 2
	[INFO] Spent 0.1125 seconds in looped compilation.
	[INFO] Removed 1 orphan nodes.
	[INFO] Removing 2 orphan components.
	[INFO] Added 1 components, removed 2
	[INFO] Spent 0.107063 seconds in looped compilation.

起動処理が終わり表示が落ち着くまで待つ

Practice3-2:　オーディオデバイスの確認

ターミナルを開きオーディオデバイスを確認

cd practice3/data
python practice3-2r.py -l
	0 Ensoniq AudioPCI: ES1371 DAC2/ADC (hw:0,0), ALSA (2 in, 0 out)
	1 Ensoniq AudioPCI: ES1371 DAC1 (hw:0,1), ALSA (0 in, 2 out)
	2 TAMAGO-XX: USB Audio (hw:1,0), ALSA (8 in, 0 out)
	3 sysdefault, ALSA (128 in, 0 out)
	4 samplerate, ALSA (128 in, 0 out)
	5 speexrate, ALSA (128 in, 0 out)
	6 pulse, ALSA (32 in, 32 out)
	7 upmix, ALSA (8 in, 0 out)
	8 vdownmix, ALSA (6 in, 0 out)
      * 9 default, ALSA (32 in, 32 out)

使いたいデバイス名か番号を覚えておく（例えば，TAMAGO-XXX あるいは2）

Practice3-2:　Python　practice3-2r を動かす

入力デバイスを指定してプログラムを実行（-d デバイス番号 or デバイス名）

python practice3-2r.py -d 2

マイクにむかって発話

定位・分離・認識が実行される

Ctrl-C で終了

動かない場合の対処法

Ctrl-Z で一時停止，kill %1 などでジョブ停止
マイクアレイを抜いて，差し直す

Practice3-2　Python　プログラム（全体構造）

音源定位サブネットワーク（既に説明済み）
音源分離サブネットワーク
音源認識サブネットワーク
メインネットワーク

Practice3-2:　PyHARK　音源分離サブネットワーク

HARK_Separation　クラス

class HARK_Separation(hark.NetworkDef):
	def build(......):
		......
		# 使用したノードを格納したリストを作り　build() の返値とする
	r = [
	    node_ghss,
	    node_synthesize,
	    node_save_save_pcm,
	]
	return r

HARK_Separation　クラスの入出力

class HARK_Separation(hark.NetworkDef):
    def build(......):
		......
		# 使用したノードを格納したリストを作り　build() の返値とする
	r = [
	    node_ghss,
	    node_synthesize,
	    node_save_save_pcm,
	]
	return r

Practice3-2　Python　音声認識

Practice3-2:　PyHARK　音声認識サブネットワーク

class HARK_Recognition(hark.NetworkDef):
    def build(self,
		network: hark.Network,
		input:   hark.DataSourceMap,
		output:  hark.DataSinkMap):
		
	node_feature_remover = network.create(hark.node.FeatureRemover)
	node_delta = network.create(hark.node.Delta)
	node_mel_filter_bank = network.create(hark.node.MelFilterBank)
	node_msls_extraction = network.create(hark.node.MSLSExtraction)
	node_pre_emphasis = network.create(hark.node.PreEmphasis)
	node_spectral_mean_normalization = network.create(
			hark.node.SpectralMeanNormalizationIncremental)
	node_speech_recognition_client = network.create(hark.node.SpeechRecognitionClient)

HARK_Recognition　のノード

ヘッダー部

class HARK_Recognition(hark.NetworkDef):
    def build(......):

必要なノードを作成

	node_pre_emphasis = network.create(hark.node.PreEmphasis)
	node_mel_filter_bank = network.create(hark.node.MelFilterBank)
	node_msls_extraction = network.create(hark.node.MSLSExtraction)
	node_delta = network.create(hark.node.Delta)
	node_feature_remover = network.create(hark.node.FeatureRemover)
	node_spectral_mean_normalization = network.create(hark.node.SpectralMeanNormalizationIncremental)
	node_speech_recognition_client = network.create(hark.node.SpeechRecognitionClient)
				......

HARK_Separation　ノード間の接続（msls_extractor）

ヘッダー部

class HARK_Separation(hark.NetworkDef):
    def build(self, network, input, output):
		......

ノード間の接続とパラメータを記述

	(node_pre_emphasis
	 .add_input("INPUT", input ["SPECC"])
	 .add_input("INPUT_TYPE", "SPECTRUM"))
	(node_mel_filter_bank
	 .add_input("INPUT", node_pre_emphasis["OUTPUT"])
	 .add_input("FBANK_COUNT", 40))
	 ......

HARK_Separation　ノード間の接続（feature_remover）

ヘッダー部

class HARK_Separation(hark.NetworkDef):
    def build(self, network, input, output):
		......

ノード間の接続とパラメータを記述

	(node_feature_remover
	 .add_input("INPUT", node_delta["OUTPUT"])
	 .add_input("SELECTOR", " ".join(map(str, range(40, 81+1))))) 40 41 42 ... 81
	(node_spectral_mean_normalization
	 .add_input("INPUT", node_feature_remover["OUTPUT"])
	 .add_input("NOT_EOF", True)
	 .add_input("SM_HISTORY", False)
	 .add_input("PERIOD", 1)
	 .add_input("BASENAME", "")
	 .add_input("OUTPUT_FNAME", ""))
	......

HARK_Separation　ASR_client への接続

ヘッダー部： inputはサブネットへの入力

class HARK_Separation(hark.NetworkDef):
    def build(self, network, input, output):
		......

ノード間の接続とパラメータを記述

	(node_speech_recognition_client
	 .add_input("FEATURES",	node_spectral_mean_normalization["OUTPUT"])
	 .add_input("MASKS", node_spectral_mean_normalization["OUTPUT"])
	 .add_input("SOURCES", input["SRC_INFO"])
	 .add_input("MFM_ENABLED", False)
	 .add_input("HOST", "localhost")
	 .add_input("PORT", 5530)
	 .add_input("SOCKET_ENABLED", True))
	......

サブネットの出力

class HARK_Separation(hark.NetworkDef):
    def build(self, network, input, output):
		......

ノード間の接続とパラメータの記述

	output.add_input("OUTPUT",
			node_speech_recognition_client["OUTPUT"])

必要なノードを格納したリストを作りbuild() の返値とする

	r = [
	    node_white_noise_adder,
	    node_pre_emphasis,
	    node_mel_filter_bank,
	    node_msls_extraction,
	    node_delta,
	    node_spectral_mean_normalization,
	    node_speech_recognition_client,
	]
	return r
	大丈夫か

Practice3-3　PyHARK　オフライン音声認識

オンライン処理とオフライン処理の違い

オンライン処理

データを逐次的に（1フレームずつ）処理
HARK Middleware による処理とほぼ同じ
マイク入力を用いたリアルタイム処理と好相性

オフライン処理（バッチ処理）

データをまとめて処理
通常のPythonプログラミングと高い整合性

Practice3-3:　動かしてみる

Kaldidecoderを起動

cd practice2/data
sh 1_run_ASR.sh
	[INFO] 
	This software includes work that is distributed under the Apache License 2.0 .
	......
	[INFO] Removed 1 orphan nodes.
	[INFO] Removing 2 orphan components.
	[INFO] Added 1 components, removed 2
	[INFO] Spent 0.1125 seconds in looped compilation.
	[INFO] Removed 1 orphan nodes.
	[INFO] Removing 2 orphan components.
	[INFO] Added 1 components, removed 2
	[INFO] Spent 0.107063 seconds in looped compilation.

起動処理が終わり表示が落ち着くまで待つ
practice3-3 を入力ファイルを input.wav 動かす

python practice3-3.py input.wav

定位・分離・認識が行われる

[INFO] 5.77266 sec, RT=1.36793
source_id = 7, azimuth = 79.982063, elevation = 36.869900, sec = 1668056389, usec = 904003
### Recognition: 2nd pass (RL heuristic best-first) 
STAT: 00 
sentence1: 一 週間 ばかり ニューヨーク を 取材 し た 
wseq1: 一+名詞/数詞 週間+接尾辞 ばかり+助詞/副助詞 ニューヨーク+名詞/固有名詞 を+助詞/格助詞 取材+名詞 し+動詞/サ行変格/連用形 た+助動詞/連体形 
	......

Practice3-3:　Python プログラム

ファイル入力モジュール

必要なモジュールのインポート

#!/usr/bin/env python
import sys
import numpy as np
import soundfile as sf
from numpy.lib.stride_tricks import sliding_window_view
import hark

コマンドライン引数処理

def main():
	if len(sys.argv) < 2:
		print("no input file")
		return
	wavfilename = sys.argv[1]

WAVファイル読み込みとフレーム分割

audio, rate = sf.read(wavfilename, dtype=np.float32)
nch = audio.shape[1]
frame_size, advance = 512, 160
frames = sliding_window_view(audio, frame_size, axis=0)[::advance, :, :]

音源定位モジュール

全フレームまとめてFFT

multi_fft = hark.node.MultiFFT()
spec = multi_fft(INPUT=frames)

雑音相関行列用の単位雑音作成

noise_cm = np.broadcast_to(
		np.eye(nch, dtype=np.complex64).flatten(), 
		(frames.shape[0], frame_size//2+1, nch*nch))

MUSIC法による音源定位

localize_music = hark.node.LocalizeMUSIC()
music_spec = localize_music(
		INPUT=spec.OUTPUT, 
		A_MATRIX='tf.zip', 
		MUSIC_ALGORITHM='SEVD',
		NOISECM=noise_cm, 
		WINDOW_TYPE='PAST', 
		ENABLE_OUTPUT_SPECTRUM=True)

音源追跡・音源分離モジュール

音源追跡

source_tracker = hark.node.SourceTracker()
src_info = source_tracker(
		INPUT=music_spec.OUTPUT, 
		THRESH=26.0, 
		PAUSE_LENGTH=1200.0, 
		MIN_SRC_INTERVAL=20.0)

音源分離

ghdss = hark.node.GHDSS()
ghdss_output = ghdss(
		INPUT_FRAMES=spec.OUTPUT, 
		INPUT_SOURCES=src_info.OUTPUT, 
		TF_CONJ_FILENAME='tf.zip’)

分離音保存・音響特徴抽出モジュール

分離音保存：　分離音スペクトルを音響信号に変換し保存

synthesize = hark.node.Synthesize()
synthesize_output = synthesize(INPUT=ghdss_output.OUTPUT)
save_wave_pcm = hark.node.SaveWavePCM()
save_wave_pcm_output = save_wave_pcm(INPUT=synthesize_output.OUTPUT)

音響特徴抽出

pre_emphasis = hark.node.PreEmphasis()
pre_emphasized_spectrum = pre_emphasis(INPUT=ghdss_output.OUTPUT, INPUT_TYPE="SPECTRUM")
mel_filter_bank = hark.node.MelFilterBank()
mel_spectrum = mel_filter_bank(INPUT=pre_emphasized_spectrum.OUTPUT, FBANK_COUNT=40)
msls_extraction = hark.node.MSLSExtraction()
msls = msls_extraction(FBANK=mel_spectrum.OUTPUT, SPECTRUM=pre_emphasized_spectrum.OUTPUT, FBANK_COUNT=40, NORMALIZATION_MODE="SPECTRAL", USE_POWER=True)

音響特徴抽出・音声認識モジュール

音響特徴抽出

feature_remover = hark.node.FeatureRemover()
asr_features = feature_remover(INPUT=msls.OUTPUT, SELECTOR=" ".join(map(str, range(40, 81+1))))
smn = hark.node.SpectralMeanNormalizationIncremental()
normalized_features = smn(INPUT=asr_features.OUTPUT, NOT_EOF=True, SM_HISTORY=False, PERIOD=1)

音声認識インターフェース

speech_recognition_client = hark.node.SpeechRecognitionClient()
asr_result = speech_recognition_client(
		FEATURES=normalized_features.OUTPUT,
		MASKS=normalized_features.OUTPUT, 
		SOURCES=src_info.OUTPUT,
		MFM_ENABLED=False, 
		HOST="localhost", 
		PORT=5530, 
		SOCKET_ENABLED=True)

Practice3　まとめ

HARKをPython上で使うPyHARK
オンライン処理・オフライン処理

データの流れを事前に記述して，フレームごとにデータ処理を行うオンライン処理
全データに対してそれぞれの処理を一括で行うオフライン処理

以上で講習会は終了です。お疲れ様。

Practice 3: PyHARK の使い方

PyHARK の機能

HARKのネットワークプログラムからPyHARKの関数型プログラミング

PyHARK プログラミングの概要

HARK-specific special object type

HARKで提供されるC++のnative container type へのアクセス法

オンライン処理とオフライン処理の違い

PyHARK の仕組みと利点

Practice3-1: PyHARKによるファイル入力オンライン音源定位 practice3-1.n のPython practice3-1.py への変換

HARK Designerで設計していたネットワークを Python でも記述できる

Practice3-1: HARK Designer のネットワークファイル

Practice3-1: MAIN ネットワークのネットワークファイル

Practice3-1: sub_Localization サブネットワークのネットワークファイル

Practice3-1: LOOP サブネットワークのネットワークファイル

Practice3-1: 変換された PyHARK によるPythonプログラム practice3-1.py

HARK ネットワークファイルから MyHARKを使ったPythonプログラムへの変換

Python プログラム practice3-1.py の全体像

Practice3-1： PyHARKによる音源定位を動かして，挙動を知る

practice3-1.py: PyHARKによるオンライン音源定位プログラムの詳細

Python クラス HARK_Localization （音源定位サブネットワーク）

Practice3-1: Python LOOPサブネットワーク

Practice3-1: Python MAINネットワーク

Practice3-1: Python main関数

Practice3-2: PyHARK マイク入力によるオンライン音源定位 PyHARKによるプログラミング: practice-3-1r.py

Practice3-2: PyHARKによるマイクのチェックとオンライン音源定位の稼働

マイク入力オンライン音源定位プログラム practice3-2.py の詳細

Practice3-2: practice3-2.py の全体構造

Practice3-2: sub_Localization 関数

Practice3-2: main 関数

practice3-2.py の音声認識

practice3-2r.py 実行前準備 音声認識エンジンの起動

practice3-2r.py 実行前準備 オーディオデバイスの確認

Practice3-2: Pythonで practice3-2r.py を動かす

Practice3-2 PyHARK オンライン音源定位・分離・音声認識プログラム

Practice3-2: PyHARK 音源分離サブネットワーク

Practice3-2 Python 音声認識ネットワーク

Practice3-2: PyHARK 音声認識サブネットワーク

Practice3-3: PyHARK オフライン音源定位・分離・認識システム

Practice3-2: Python プログラム（全体構造）

Practice3-2: Python main関数

Practice3-2: PyHARKによる音声認識（Practice2のPyHARK化）

Practice3-2: 音声認識エンジンの起動

Practice3-2: オーディオデバイスの確認

Practice3-2: Python practice3-2r を動かす

Practice3-2 Python プログラム（全体構造）

Practice3-2: PyHARK 音源分離サブネットワーク

Practice3-2 Python 音声認識

Practice3-2: PyHARK 音声認識サブネットワーク

Practice3-3 PyHARK オフライン音声認識

オンライン処理とオフライン処理の違い

Practice3-3: 動かしてみる

Practice3-3: Python プログラム

Practice3 まとめ

Navigation

Practice3-1:　PyHARKによるファイル入力オンライン音源定位
`practice3-1.n` のPython `practice3-1.py` への変換

Practice3-1:　HARK Designer のネットワークファイル

Practice3-1: 変換された PyHARK によるPythonプログラム
practice3-1.py

Python プログラム `practice3-1.py` の全体像

practice3-1.py:　PyHARKによるオンライン音源定位プログラムの詳細

Practice3-1:　Python LOOPサブネットワーク

Practice3-1:　Python MAINネットワーク

Practice3-1:　Python `main`関数

Practice3-2:　PyHARK マイク入力によるオンライン音源定位
PyHARKによるプログラミング: `practice-3-1r.py`

Practice3-2:　PyHARKによるマイクのチェックとオンライン音源定位の稼働

マイク入力オンライン音源定位プログラム `practice3-2.py` の詳細

Practice3-2: `practice3-2.py` の全体構造

Practice3-2:　`sub_Localization` 関数

Practice3-2:　`main` 関数

`practice3-2.py`　の音声認識

practice3-2r.py 実行前準備　音声認識エンジンの起動

practice3-2r.py 実行前準備　オーディオデバイスの確認

Practice3-2:　Pythonで　practice3-2r.py を動かす

Practice3-2　PyHARK　オンライン音源定位・分離・音声認識プログラム

Practice3-2:　PyHARK　音源分離サブネットワーク

Practice3-2　Python　音声認識ネットワーク

Practice3-2:　PyHARK　音声認識サブネットワーク

Practice3-3:　PyHARK オフライン音源定位・分離・認識システム

Practice3-2:　Python プログラム（全体構造）

Practice3-2:　Python main関数

Practice3-2:　PyHARKによる音声認識（Practice2のPyHARK化）

Practice3-2:　音声認識エンジンの起動

Practice3-2:　オーディオデバイスの確認

Practice3-2:　Python　practice3-2r を動かす

Practice3-2　Python　プログラム（全体構造）

Practice3-2:　PyHARK　音源分離サブネットワーク

Practice3-2　Python　音声認識

Practice3-2:　PyHARK　音声認識サブネットワーク

Practice3-3　PyHARK　オフライン音声認識

Practice3-3:　動かしてみる

Practice3-3:　Python プログラム

Practice3　まとめ