{"id":5446,"date":"2025-06-14T12:31:40","date_gmt":"2025-06-14T10:31:40","guid":{"rendered":"https:\/\/www.blue-bears.com\/blog\/?p=5446"},"modified":"2025-10-18T13:13:49","modified_gmt":"2025-10-18T11:13:49","slug":"installation-wisper-openai-sous-debian","status":"publish","type":"post","link":"http:\/\/www.blue-bears.com\/blog\/?p=5446","title":{"rendered":"Installation Wisper OpenAI sous DEBIAN"},"content":{"rendered":"<h4>Installation :<\/h4>\n<p>Mise \u00e0 jour paquets :<\/p>\n<ul>\n<li class=\"bg-black\" data-line=\"15\"><code class=\"language-bash\"><span class=\"code-block\">sudo apt update<\/span><\/code><\/li>\n<li class=\"bg-black\" data-line=\"15\"><code class=\"language-bash\"><span class=\"code-block\">sudo apt upgrade -y<\/span><\/code><\/li>\n<li class=\"bg-black\" data-line=\"15\"><code class=\"language-bash\"><span class=\"code-block\">sudo apt install -y python3-pip python3-dev build-essential<\/span><\/code><\/li>\n<li class=\"bg-black\" data-line=\"28\"><code class=\"language-bash\"><span class=\"code-block\"><del>sudo apt install -y python3-venv<\/del><br \/>\n<\/span><\/code><\/li>\n<li class=\"bg-black\" data-line=\"28\"><del><code class=\"language-bash\"><span class=\"code-block\">python3 -m venv whisper-env<\/span><\/code><\/del><\/li>\n<\/ul>\n<p class=\"bg-black\" data-line=\"36\">Installation Torch (no cache : sauf si\u00a0 m\u00e9moire &gt; 4 GO)<\/p>\n<pre class=\"lang-py s-code-block\" style=\"padding-left: 40px;\"><code class=\"hljs language-python\" data-highlighted=\"yes\">pip install torch --no-cache-<span class=\"hljs-built_in\">dir<\/span><\/code><\/pre>\n<p>Installation FFMPEG<\/p>\n<pre class=\"bg-black\" style=\"padding-left: 40px;\" data-line=\"18\"><code class=\"language-bash\"><span class=\"code-block\">sudo apt install -y ffmpeg<\/span><\/code><\/pre>\n<p>Cr\u00e9er l\u2019environnement virtuel \/\/\/ \/ Ne fonctionne pas avec moins de 4Go de memoire vive (3.8G \u00e0 disposition !!!) pour l&rsquo;installation de Torch (l&rsquo;installation de Torch hors Env Python ne fonctionne pas)<\/p>\n<p class=\"bg-black\" style=\"padding-left: 40px;\" data-line=\"28\"><del><code class=\"language-bash\"><span class=\"code-block\"><span class=\"hljs-built_in\">source<\/span> whisper-env\/bin\/activate<\/span><\/code><\/del><\/p>\n<p class=\"bg-black\" style=\"padding-left: 40px;\" data-line=\"28\">Check :<\/p>\n<pre class=\"bg-black\" style=\"padding-left: 40px;\" data-line=\"36\"><code class=\"language-bash\"><span class=\"code-block\">python3 --version\r\npip3 --version<\/span><\/code><code class=\"hljs language-python\" data-highlighted=\"yes\"><\/code> Installation Wisper :<\/pre>\n<ul>\n<li>\n<pre class=\"bg-black\" data-line=\"12\"><code class=\"language-bash\"><span class=\"code-block\">pip install git+https:\/\/github.com\/openai\/whisper.git<\/span><\/code><code><\/code><\/pre>\n<\/li>\n<\/ul>\n<h4>Utilisation de Whisper =&gt; Taille fichier audio 25Mo MAX<\/h4>\n<p>Sinon &#8230; perte de connexion.<\/p>\n<p>Heureusement on peut compresser (44Mo de MP3 deviennent 9Mo de Ogg ) &#8230; mais ogg entraine : connexion termin\u00e9e\u00a0 \ud83d\ude41<\/p>\n<p><code>ffmpeg -i audio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip audio.ogg<\/code><\/p>\n<p>Les commandes :<\/p>\n<p><code class=\"language-bash\">cd \/home\/YSALMON\/audio\/<\/code><\/p>\n<pre class=\"bg-black\" data-line=\"34\"><code class=\"language-bash\"><span class=\"code-block\">whisper --model small<\/span> \"audio.mp3\" --language French --fp16 False --output_format srt  \r\nwhisper --model small \"audio.mp3\" --language French --fp16 False --output_format srt --task transcribe\r\n<strong>whisper --model small \"\/home\/YSALMON\/audio\/audio.mp3\" --language French --fp16 False --output_format srt --task transcribe\r\nwhisper --model small \"\/home\/YSALMON\/audio\/audio.ogg\" --language French --fp16 False --output_format srt --task transcribe<\/strong><\/code><\/pre>\n<div>\n<p><code><br \/>\n# model = whisper.load_model(\"tiny.en\")<br \/>\n# model = whisper.load_model(\"base.en\")<br \/>\n# model = whisper.load_model(\"small\") # load the small model =&gt; OK<br \/>\n# model = whisper.load_model(\"medium.en\")<br \/>\n# model = whisper.load_model(\"large\") # Plantage !!!!!!!!!!<\/code><\/p>\n<p><code># model = whisper.load_model(\"medium) # Plantage !!!!!!!!!<\/code><\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Fichiers de sortie dans le r\u00e9pertoire courant aux\u00a0 formats .json .tsv .txt .vtt .srt (sous-titres)<\/p>\n<p>Utilisable sous VLC =&gt;Charger le fichier audio<\/p>\n<ul>\n<li>menu sous-titres =&gt; Ajouter fichier sous-titre (.srt .txt .vtt)<\/li>\n<li>Audio =&gt; Visualisation =&gt; Spectrom\u00e8tre (sinon marche pas !)<\/li>\n<\/ul>\n<h4>Observations :<\/h4>\n<ul>\n<li>Plantage =&gt; D\u00e9connexion =&gt; d\u00e9passement m\u00e9moire (mod\u00e8les trop gros en m\u00e9moire\u00a0 : seul le small passe)<\/li>\n<li>1 minute audio = 4 minutes de temps de transcription =&gt; sur :\n<ul>\n<li><span data-id=\"sysinfo_kernel_arch\">Linux 5.10.0-34-amd64 on x86_64<\/span><\/li>\n<li><span data-id=\"sysinfo_cpu_type\">Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 4 cores<\/span><\/li>\n<li>M\u00e9moire 3.81GiB<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>R\u00e9f\u00e9rences :<\/h4>\n<ul>\n<li><a href=\"https:\/\/labex.io\/fr\/tutorials\/linux-how-to-install-whisper-cli-on-linux-437909\">https:\/\/labex.io\/fr\/tutorials\/linux-how-to-install-whisper-cli-on-linux-437909<\/a><\/li>\n<li><a href=\"https:\/\/www.css.cnrs.fr\/fr\/whisper-pour-retranscrire-des-entretiens\/\">https:\/\/www.css.cnrs.fr\/fr\/whisper-pour-retranscrire-des-entretiens\/<\/a><\/li>\n<li><a href=\"https:\/\/whisper-api.com\/docs\/transcription-options\/\">https:\/\/whisper-api.com\/docs\/transcription-options\/<\/a><\/li>\n<li><\/li>\n<\/ul>\n<h4>Les commandes :<\/h4>\n<p>root@vxxxx:\/home\/YSALMON\/audio# whisper &#8211;help<br \/>\nusage: whisper<\/p>\n<ul>\n<li>[-h]<\/li>\n<li>[&#8211;model MODEL]<\/li>\n<li>[&#8211;model_dir MODEL_DIR]<\/li>\n<li>[&#8211;device DEVICE]<\/li>\n<li>[&#8211;output_dir OUTPUT_DIR]<\/li>\n<li>[&#8211;output_format {txt,vtt,srt,tsv,json,all}]<\/li>\n<li>[&#8211;verbose VERBOSE]<\/li>\n<li>[&#8211;task {transcribe,translate}]<\/li>\n<li>[&#8211;language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}]<br \/>\n[&#8211;temperature TEMPERATURE]<\/li>\n<li>[&#8211;best_of BEST_OF]<\/li>\n<li>[&#8211;beam_size BEAM_SIZE]\u00a0 5 par defaut = nombre d&rsquo;hypoth\u00e9ses faites =&gt; l&rsquo;augmenter am\u00e9liore la qualit\u00e9 et la m\u00e9moire utilsi\u00e9e + temps de calcul.<\/li>\n<li>[&#8211;patience PATIENCE] : 1.0 par ddefaut : temps accord\u00e9 pour trouver une meilleure hypoth\u00e9se.<br \/>\n[&#8211;length_penalty LENGTH_PENALTY] : 1.0 par defaut =&gt; augmenter = allonger la taille de la sortie.<\/li>\n<li>[&#8211;suppress_tokens SUPPRESS_TOKENS]<br \/>\n[&#8211;initial_prompt INITIAL_PROMPT] : le context : ex: \u00ab\u00a0<code>initial_prompt=Meeting transcript:<\/code><\/li>\n<li>[&#8211;carry_initial_prompt CARRY_INITIAL_PROMPT]<\/li>\n<li>[&#8211;condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT]<\/li>\n<li>[&#8211;fp16 FP16]<\/li>\n<li>[&#8211;temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK]<\/li>\n<li>[&#8211;compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD]<\/li>\n<li>[&#8211;logprob_threshold LOGPROB_THRESHOLD]<\/li>\n<li>[&#8211;no_speech_threshold NO_SPEECH_THRESHOLD]<\/li>\n<li>[&#8211;word_timestamps WORD_TIMESTAMPS]<\/li>\n<li>[&#8211;prepend_punctuations PREPEND_PUNCTUATIONS]<\/li>\n<li>[&#8211;append_punctuations APPEND_PUNCTUATIONS]<\/li>\n<li>[&#8211;highlight_words HIGHLIGHT_WORDS]<\/li>\n<li>[&#8211;max_line_width MAX_LINE_WIDTH]<\/li>\n<li>[&#8211;max_line_count MAX_LINE_COUNT]<\/li>\n<li>[&#8211;max_words_per_line MAX_WORDS_PER_LINE]<\/li>\n<li>[&#8211;threads THREADS]<\/li>\n<li>[&#8211;clip_timestamps CLIP_TIMESTAMPS]<\/li>\n<li>[&#8211;hallucination_silence_threshold HALLUCINATION_SILENCE_THRESHOLD]<\/li>\n<li>audio<\/li>\n<\/ul>\n<p>positional arguments:<br \/>\naudio audio file(s) to transcribe<\/p>\n<p>optional arguments:<br \/>\n-h, &#8211;help show this help message and exit<br \/>\n&#8211;model MODEL name of the Whisper model to use (default: turbo)<br \/>\n&#8211;model_dir MODEL_DIR<br \/>\nthe path to save model files; uses ~\/.cache\/whisper by default (default: None)<br \/>\n&#8211;device DEVICE device to use for PyTorch inference (default: cpu)<br \/>\n&#8211;output_dir OUTPUT_DIR, -o OUTPUT_DIR<br \/>\ndirectory to save the outputs (default: .)<br \/>\n&#8211;output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all}<br \/>\nformat of the output file; if not specified, all available formats will be produced<br \/>\n(default: all)<br \/>\n&#8211;verbose VERBOSE whether to print out the progress and debug messages (default: True)<br \/>\n&#8211;task {transcribe,translate}<br \/>\nwhether to perform X-&gt;X speech recognition (&lsquo;transcribe&rsquo;) or X-&gt;English translation<br \/>\n(&lsquo;translate&rsquo;) (default: transcribe)<br \/>\n&#8211;language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}<br \/>\nlanguage spoken in the audio, specify None to perform language detection (default: None)<br \/>\n&#8211;temperature TEMPERATURE<br \/>\ntemperature to use for sampling (default: 0)<br \/>\n&#8211;best_of BEST_OF number of candidates when sampling with non-zero temperature (default: 5)<br \/>\n&#8211;beam_size BEAM_SIZE<br \/>\nnumber of beams in beam search, only applicable when temperature is zero (default: 5)<br \/>\n&#8211;patience PATIENCE optional patience value to use in beam decoding, as in https:\/\/arxiv.org\/abs\/2204.05424,<br \/>\nthe default (1.0) is equivalent to conventional beam search (default: None)<br \/>\n&#8211;length_penalty LENGTH_PENALTY<br \/>\noptional token length penalty coefficient (alpha) as in<br \/>\nhttps:\/\/arxiv.org\/abs\/1609.08144, uses simple length normalization by default (default:<br \/>\nNone)<br \/>\n&#8211;suppress_tokens SUPPRESS_TOKENS<br \/>\ncomma-separated list of token ids to suppress during sampling; &lsquo;-1&rsquo; will suppress most<br \/>\nspecial characters except common punctuations (default: -1)<br \/>\n&#8211;initial_prompt INITIAL_PROMPT<br \/>\noptional text to provide as a prompt for the first window. (default: None)<br \/>\n&#8211;carry_initial_prompt CARRY_INITIAL_PROMPT<br \/>\nif True, prepend initial_prompt to every internal decode() call. May reduce the<br \/>\neffectiveness of condition_on_previous_text (default: False)<br \/>\n&#8211;condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT<br \/>\nif True, provide the previous output of the model as a prompt for the next window;<br \/>\ndisabling may make the text inconsistent across windows, but the model becomes less<br \/>\nprone to getting stuck in a failure loop (default: True)<br \/>\n&#8211;fp16 FP16 whether to perform inference in fp16; True by default (default: True)<br \/>\n&#8211;temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK<br \/>\ntemperature to increase when falling back when the decoding fails to meet either of the<br \/>\nthresholds below (default: 0.2)<br \/>\n&#8211;compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD<br \/>\nif the gzip compression ratio is higher than this value, treat the decoding as failed<br \/>\n(default: 2.4)<br \/>\n&#8211;logprob_threshold LOGPROB_THRESHOLD<br \/>\nif the average log probability is lower than this value, treat the decoding as failed<br \/>\n(default: -1.0)<br \/>\n&#8211;no_speech_threshold NO_SPEECH_THRESHOLD<br \/>\nif the probability of the &lt;|nospeech|&gt; token is higher than this value AND the decoding<br \/>\nhas failed due to `logprob_threshold`, consider the segment as silence (default: 0.6)<br \/>\n&#8211;word_timestamps WORD_TIMESTAMPS<br \/>\n(experimental) extract word-level timestamps and refine the results based on them<br \/>\n(default: False)<br \/>\n&#8211;prepend_punctuations PREPEND_PUNCTUATIONS<br \/>\nif word_timestamps is True, merge these punctuation symbols with the next word (default:<br \/>\n\u00ab\u00a0&lsquo;\u201c\u00bf([{-)<br \/>\n&#8211;append_punctuations APPEND_PUNCTUATIONS<br \/>\nif word_timestamps is True, merge these punctuation symbols with the previous word<br \/>\n(default: \u00ab\u00a0&lsquo;.\u3002,\uff0c!\uff01?\uff1f:\uff1a\u201d)]}\u3001)<br \/>\n&#8211;highlight_words HIGHLIGHT_WORDS<br \/>\n(requires &#8211;word_timestamps True) underline each word as it is spoken in srt and vtt<br \/>\n(default: False)<br \/>\n&#8211;max_line_width MAX_LINE_WIDTH<br \/>\n(requires &#8211;word_timestamps True) the maximum number of characters in a line before<br \/>\nbreaking the line (default: None)<br \/>\n&#8211;max_line_count MAX_LINE_COUNT<br \/>\n(requires &#8211;word_timestamps True) the maximum number of lines in a segment (default:<br \/>\nNone)<br \/>\n&#8211;max_words_per_line MAX_WORDS_PER_LINE<br \/>\n(requires &#8211;word_timestamps True, no effect with &#8211;max_line_width) the maximum number of<br \/>\nwords in a segment (default: None)<br \/>\n&#8211;threads THREADS number of threads used by torch for CPU inference; supercedes<br \/>\nMKL_NUM_THREADS\/OMP_NUM_THREADS (default: 0)<br \/>\n&#8211;clip_timestamps CLIP_TIMESTAMPS<br \/>\ncomma-separated list start,end,start,end,&#8230; timestamps (in seconds) of clips to<br \/>\nprocess, where the last end timestamp defaults to the end of the file (default: 0)<br \/>\n&#8211;hallucination_silence_threshold HALLUCINATION_SILENCE_THRESHOLD<br \/>\n(requires &#8211;word_timestamps True) skip silent periods longer than this threshold (in<br \/>\nseconds) when a possible hallucination is detected (default: None)<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Installation : Mise \u00e0 jour paquets : sudo apt update sudo apt upgrade -y sudo apt install -y python3-pip python3-dev build-essential sudo [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-5446","post","type-post","status-publish","format-standard","hentry","category-non-classe"],"_links":{"self":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5446","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5446"}],"version-history":[{"count":11,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5446\/revisions"}],"predecessor-version":[{"id":5494,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5446\/revisions\/5494"}],"wp:attachment":[{"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5446"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5446"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.blue-bears.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5446"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}