{"id":4535,"date":"2020-12-03T09:22:17","date_gmt":"2020-12-03T08:22:17","guid":{"rendered":"http:\/\/www.blue-bears.com\/blog\/?p=4535"},"modified":"2020-12-03T14:18:51","modified_gmt":"2020-12-03T13:18:51","slug":"ocr-pdf-tesseract-comptes-detailles","status":"publish","type":"post","link":"http:\/\/www.blue-bears.com\/blog\/?p=4535","title":{"rendered":"OCR pdf Tesseract Comptes D\u00e9taill\u00e9s"},"content":{"rendered":"<ul>\n<li>Besoin : Num\u00e9riser les comptes d\u00e9taill\u00e9s des soci\u00e9t\u00e9s (voir liasses fiscales) et r\u00e9cup\u00e9rer les donn\u00e9es sous formats exploitables Excel (Txt, Csv,&#8230;)<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul>\n<li>Probl\u00e9matique g\u00e9n\u00e9rale =&gt; Saisie de donn\u00e9es de formulaires scann\u00e9s =&gt; Format Image (jpeg, png, gif&#8230;) ou pdf<\/li>\n<li>Probl\u00e9matique de formulaire =&gt; reconnaissance de formulaire<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul>\n<li>Id\u00e9e : utiliser PHP \/ OCR\u00a0 =&gt; Composant Tesseract (Logiciel Open Source de r\u00e9f\u00e9rence)\n<ul>\n<li><a href=\"https:\/\/github.com\/thiagoalessio\/tesseract-ocr-for-php\">https:\/\/github.com\/thiagoalessio\/tesseract-ocr-for-php<\/a><\/li>\n<li><a href=\"https:\/\/fr.wikipedia.org\/wiki\/Tesseract_(logiciel)\">https:\/\/fr.wikipedia.org\/wiki\/Tesseract_(logiciel)<\/a><\/li>\n<\/ul>\n<\/li>\n<li>Pour la reconnaissance des pages =&gt; R\u00e9gler le sujet en transformant un doc PDF en un seul fichier image en longueur (concat\u00e9nation image\/pages en un seul fichier image).<\/li>\n<li>Pour la reconnaissance des blocs =&gt; Utiliser un r\u00e9seau Neurones ?\n<ul>\n<li><a href=\"https:\/\/www.php.net\/manual\/en\/book.fann.php\">https:\/\/www.php.net\/manual\/en\/book.fann.php<\/a><\/li>\n<li>Tesseract 4.0 utilise aussi LSTM NNetwork<\/li>\n<\/ul>\n<\/li>\n<li>Pour un probl\u00e9me equivalent =&gt; Solutions :\n<ul>\n<li><a href=\"https:\/\/stackoverflow.com\/questions\/5041038\/is-there-an-ocr-library-that-outputs-coordinates-of-words-found-within-an-image\">https:\/\/stackoverflow.com\/questions\/5041038\/is-there-an-ocr-library-that-outputs-coordinates-of-words-found-within-an-image<\/a><\/li>\n<li><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Besoin : Num\u00e9riser les comptes d\u00e9taill\u00e9s des soci\u00e9t\u00e9s (voir liasses fiscales) et r\u00e9cup\u00e9rer les donn\u00e9es sous formats exploitables Excel (Txt, Csv,&#8230;) &nbsp; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-4535","post","type-post","status-publish","format-standard","hentry","category-non-classe"],"_links":{"self":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4535","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4535"}],"version-history":[{"count":5,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4535\/revisions"}],"predecessor-version":[{"id":4540,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4535\/revisions\/4540"}],"wp:attachment":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4535"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4535"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4535"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}