2021年6月27日 星期日

tesseract

tesseract 是一套 OCR library 含 command line

CentOS 8 套件安裝法
因為有相依性,需先 enable PowerTools
dnf config-manager --set-enabled powertools
dns -y install tesseract tesseract-langpack-chi_tra tesseract-langpack-chi_sim
(後兩個是繁體及簡體的語言包)

編譯安裝法:
需要較新版的 GCC,請先參考 CentOS upgrade GCC 安裝新版的 GCC
# 切到 GCC9 環境
scl enable devtoolset-9 bash
wget http://www.leptonica.org/source/leptonica-1.81.1.tar.gz
tar zxvf leptonica-1.81.1.tar.gz
cd leptonica-1.81.1
./configure && make && make install
# 安裝編譯 tesseract 需要的圖檔 devel 套件
dnf -y install libtiff-devel libjpeg-devel libpng-devel
git clone https://github.com/tesseract-ocr/tesseract
cd tesseract
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./configure && make && make install
# 下載語言包
cd /usr/local/share/tessdata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_tra.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_tra_vert.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim_vert.traineddata

沒有留言:

ESXi find what process lock file

ESXi 檔案刪不掉,出現 Device or resource busy 若檔案名稱是 windows-10-flat.vmdk 下此指令就可以知道那邊佔用 ps | grep `lsof | awk '/windows-10-flat.vmdk/ {print $1...