Ukusebenzisa i-GAN Architecture ukubuyisela iiFayile zoMculo ezicinezele kakhulu

🎶 2022-08-31 16:40:02 - Paris/France.

IiSpectrograms (a) iikliphu zoqobo zeaudio, (b) ezihambelana ne-3kbit/s MP32 iinguqulelo, kunye (c), (d), (e) nobuyiselo ngeesampulu ezahlukeneyo ezithathwe ngokungenamkhethe z-ingxolo ukusuka ku-N(0,I). Ngetyala: Lattner & Nistal.

Kwiminyaka embalwa edlulileyo, izazinzulu zekhompyuter ziye zaphuhlisa itekhnoloji ekhulayo kunye nezixhobo zokugcina umculo kunye neefayile ezimanyelwayo kwizixhobo zombane. Isiganeko esibalulekileyo sokugcinwa komculo yayiluphuhliso lweteknoloji yeMP3 (okt MPEG-1 Layer 3), ubuchule bokucinezela iziqeshana zesandi okanye iingoma kwiifayile ezincinci ezinokugcinwa ngokulula kwaye zidluliselwe phakathi kwezixhobo.

I-encoding, ukuhlela, kunye nokunyanzeliswa kweefayile zemidiya, kuquka i-PKZIP, i-JPEG, i-GIF, i-PNG, i-MP3, i-AAC, i-Cinepak, kunye neefayile ze-MPEG-2, zifezwa ngokusebenzisa isethi yobuchwepheshe eyaziwa ngokuba yigama le-codecs. IiCodecs zitekhnoloji zokucinezela ezinamacandelo amabini angundoqo: i-encoder ecinezela iifayile kunye ne-decoder eyazicinezelayo.

Kukho iintlobo ezimbini zee-codecs, ezibizwa ngokuba yi-codecs elahlekileyo kunye nelahlekileyo. Xa i-decompressing, i-codecs engalahlekiyo, njenge-codec ye-PKZIP kunye ne-PNG, iphinda ivelise kwakhona ifayile efanayo neefayile zokuqala. Iindlela zokuxinzelela ezilahlekileyo, ngakolunye uhlangothi, zivelisa i-faksi yefayile yokuqala evakalayo (okanye ibukeka) njengeyokuqala kodwa ithatha indawo yokugcina encinci kwizixhobo zombane.

Iikhowudi zomsindo ezilahlekileyo zisebenza ngokucinezela imijelo yomsindo wedijithali, ukususa idatha ethile, kwaye emva koko uyicinezele. Ngokubanzi, umahluko phakathi kwefayile yokuqala kunye nefayile engafakwanga unzima okanye akunakwenzeka ukuba abantu bawubone.

Nangona kunjalo, xa ii-codecs ezilahlekileyo zisebenzisa amaxabiso aphezulu oxinzelelo, zinokuzisa ukuthotywa kunye nokuguqula ngokubonakalayo imiqondiso yomsindo. Kutshanje, izazinzulu zekhompyuter zizame ukoyisa lo mda we-codecs ezilahlekileyo kunye nokuphucula umgangatho weefayile ezicinezelekileyo zisebenzisa ubuchule bokufunda obunzulu.

Abaphandi kwi-Sony Computer Science Laboratories (CSL) basandul 'ukuqulunqa indlela entsha yokufunda enzulu yokuphucula nokubuyisela umgangatho weengoma kunye nokurekhoda komsindo oxinzelelwe kakhulu (oko kukuthi iifayile zomsindo ezixinzelelwe yi-codecs ilahleko kunye namazinga aphezulu oxinzelelo). Le ndlela, yaziswe kwinqaku elipapashwe kwangaphambili kwi-arXiv, isekwe kuthungelwano lwe-adversarial networks (GANs), iimodeli zokufunda koomatshini apho iinethiwekhi ezimbini ze-neural "zikhuphisana" ukwenza uqikelelo oluchanekileyo okanye oluthembekileyo.

"Imisebenzi emininzi iye yajongana nengxaki yophuculo lweaudio kunye nokususwa kwezinto zakudala zoxinzelelo kusetyenziswa ubuchule bokufunda obunzulu," babhala uStefan Lattner kunye noJavier Nistal kwinqaku labo. "Nangona kunjalo, yimisebenzi embalwa kuphela ejongana nokubuyiselwa kweempawu zeaudio ezicinezelwe kakhulu kwindawo yomculo. Kule sifundo, sivavanya i-stochastic generator ye-generative adversarial network (GAN) ye-architecture yalo msebenzi. »

Njengezinye ii-GAN, imodeli eyenziwe nguLattner kunye noNistal yenziwe ngeemodeli ezimbini ezihlukeneyo, ezibizwa ngokuba yi "generator (G)" kunye "ne-critical (D)". Ijenereyitha ifumana isicatshulwa sophawu lomsindo womculo oluxinaniswe kwiMP3, emelwe yispectrogram (oko kukuthi umboniso obonwayo wefrikhwekhwensi yomthamo wesiginali yesandi).

Ijenereyitha ifunda ngokuqhubekayo ukuvelisa inguqulelo ebuyiselweyo yaloo mqondiso wokuqala, omncinci ngobukhulu. Ngeli xesha, icandelo elibalulekileyo loyilo lwe-GAN lifunda ukwahlula phakathi kweefayile zoqobo ezikumgangatho ophezulu kunye neenguqulelo ezibuyiselweyo, ngaloo ndlela zibona umahluko phakathi kwazo. Ekugqibeleni, ulwazi oluqokelelwe ngumhlalutyi lusetyenziselwa ukuphucula umgangatho weefayile ezibuyiselweyo, ukuqinisekisa ukuba umculo okanye idatha yomsindo ekhoyo kwiifayile ezibuyiselweyo ithembekile ngokusemandleni kwi-original.

I-Lattner kunye no-Nistal bavavanya ulwakhiwo lwabo olusekwe kwi-GAN kuthotho lweemvavanyo, ezijolise ekuqinisekiseni ukuba imodeli yabo ingawuphucula na umgangatho wamagalelo eMP3 kwaye ivelise iisampulu ezicinezelweyo ezikumgangatho ophezulu kufutshane nefayile yoqobo kunezo zidalwe zezinye iimodeli ezisisiseko zokucinezelwa komsindo. . Iziphumo zabo zazithembisa kakhulu, njengoko bafumanisa ukuba imodeli yokubuyisela icinezelwe kakhulu (i-3 kbps kunye ne-16 kbps) iifayile zeMP32 zazingcono ngokubanzi kuneefayile ezicinezelweyo zangaphambili, njengoko zivakala ngcono kubaphulaphuli abangabantu. Xa kusetyenziswa amazinga oxinzelelo asezantsi (64 kbps mono), kwelinye icala, iqela lafumanisa ukuba imodeli yabo iqhube kakubi kancinci kunezixhobo ezisisiseko zocinezelo zeMP3.

"Senza uphononongo olucokisekileyo lwamava ahlukeneyo sisebenzisa imilinganiselo yenjongo kunye neemvavanyo zokuphulaphula," kusho uLattner noNistal. “Sifumanisa ukuba iimodeli zinokuwuphucula umgangatho wemiqondiso yomsindo xa kuthelekiswa neenguqulelo zeMP3 ze-16 kunye ne-32 kbit/s kwaye iijenereyitha zestochastic ziyakwazi ukuvelisa iziphumo ezikufutshane neempawu zokuqala kunezo zejenereyitha zokumisela. . »

Njengenxalenye yophononongo lwabo, abaphandi babonise ukuba uyilo lwabo lunokuvelisa ngempumelelo kwaye longeze umxholo oqhelekileyo ophucula umgangatho womsindo weengoma ezicinezelweyo. Umxholo owenziweyo uquka izinto ezingxolayo, ilizwi eliculayo elivelisa iisibilants okanye iziqhushumbisi (okt "s" kunye no "t" izandi) kunye nezandi zikatala.

Kwixesha elizayo, imodeli abayenzileyo inokunceda ukunciphisa kakhulu ubungakanani beefayile zomculo weMP3 ngaphandle kokuguqula umxholo wabo okanye ukwenza iimpazamo eziphawuleka ngokulula. Oku kunokuba nefuthe elibalulekileyo kugcino kunye nokuhanjiswa komculo kwiiapps eziphathwayo. Ukusasaza (umzekelo, iSpotify, iApple Music, njl.) kunye nezixhobo zombane zanamhlanje ezibandakanya ii-smartphones, iitafile kunye neekhompyuter.


UGoogle Lyra uya kwenza iifowuni zelizwi kwabanye abasebenzisi abazibhiliyoni


Iinkcukacha ezithe xaxe:
UStefan Lattner, uJavier Nistal, ukuBuyiselwa kweStochastic koMculo woMculo oXhathiweyo kakhulu usebenzisa iiNethiwekhi ze-Adversarial Generative. arXiv:2207.01667v1 [cs.SD]arxiv.org/abs/2207.01667

© 2022 Science X Network

Citation: Ukusebenzisa i-GAN Architecture ukubuyisela iiFayile zoMculo ezicinezele kakhulu (2022, Aug 31) Ifunyenwe ngoSeptemba 1, 2022 ukusuka https://techxplore.com/news/2022-08-gan-architecture-heavily-compressed-music. html

Olu xwebhu luphantsi kwelungelo lokushicilela. Ngaphandle kokusetyenziswa ngokufanelekileyo ngeenjongo zokufunda zabucala okanye uphando, akukho nxalenye inokuphinda iveliswe ngaphandle kwemvume ebhaliweyo. Umxholo unikezelwe ngolwazi kuphela.

SOURCE: Uphonononga Iindaba

Ungalibazisi ukwabelana ngenqaku lethu kwiinethiwekhi zentlalo ukusinika ukomelela okuqinileyo. 🎵

Phuma kuguqulelo oluphathwayo