[00:27] *** ndiddy has joined #archiveteam-bs [00:33] *** Sk1d has quit IRC (Read error: Operation timed out) [00:35] *** RichardG has quit IRC (Read error: Connection reset by peer) [00:36] *** RichardG has joined #archiveteam-bs [00:36] *** Sk1d has joined #archiveteam-bs [00:48] *** wopls2 has joined #archiveteam-bs [00:48] *** pie___ has quit IRC (Read error: Operation timed out) [00:53] *** dashcloud has joined #archiveteam-bs [01:17] *** Martle has joined #archiveteam-bs [01:25] *** Verified_ has quit IRC (Quit: Leaving) [01:25] *** wp494 has quit IRC (Ping timeout: 633 seconds) [01:26] *** wp494 has joined #archiveteam-bs [02:10] *** qw3rty114 has joined #archiveteam-bs [02:14] *** qw3rty113 has quit IRC (Read error: Operation timed out) [03:22] *** vitzli has joined #archiveteam-bs [03:43] *** adinbied has joined #archiveteam-bs [03:57] *** Isanami has joined #archiveteam-bs [04:14] *** Isanami has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) [04:16] *** qw3rty115 has joined #archiveteam-bs [04:17] *** odemgi_ has joined #archiveteam-bs [04:19] *** qw3rty114 has quit IRC (Read error: Operation timed out) [04:20] *** odemgi has quit IRC (Read error: Operation timed out) [04:20] *** odemg has quit IRC (Ping timeout: 265 seconds) [04:32] *** odemg has joined #archiveteam-bs [04:58] *** hdch has quit IRC (Ping timeout: 265 seconds) [05:10] *** Mateon1 has quit IRC (Remote host closed the connection) [05:10] *** Mateon1 has joined #archiveteam-bs [05:16] *** SketchCo1 is now known as SketchCow [05:31] *** ndiddy has quit IRC () [05:38] *** fredgido_ has quit IRC (Read error: Connection reset by peer) [05:39] *** fredgido_ has joined #archiveteam-bs [06:02] *** hdch has joined #archiveteam-bs [06:50] *** hook54321 has joined #archiveteam-bs [07:17] *** vitzli has quit IRC (Quit: Leaving) [07:43] *** schbirid has joined #archiveteam-bs [07:52] *** odemgi has joined #archiveteam-bs [07:54] *** odemgi_ has quit IRC (Read error: Operation timed out) [07:55] *** ranma has joined #archiveteam-bs [08:06] *** LFlare43 has quit IRC (Remote host closed the connection) [08:06] *** LFlare43 has joined #archiveteam-bs [08:12] *** hdch has quit IRC (Remote host closed the connection) [08:14] SketchCow: i'm uploading box1 of vhscover scans [08:15] before i ship tapes not scanned i will see about getting them scanned [08:16] i want to say any tapes before june of this year most likely didn't get scanned [08:16] *** hdch has joined #archiveteam-bs [08:16] https://archive.org/details/vhscovers-misc-jason-scott-box1-20181203 [10:00] that 35gb font archive jason was talking about on twitter sounds really nifty. [10:26] *** wp494 has quit IRC (Read error: Operation timed out) [10:27] *** wp494 has joined #archiveteam-bs [10:36] *** BlueMax has quit IRC (Remote host closed the connection) [10:37] *** BlueMax has joined #archiveteam-bs [11:09] *** BlueMax has quit IRC (Quit: Leaving) [11:09] *** hdch has quit IRC (Quit: oops) [12:06] *** LFlare43 has quit IRC (Ping timeout: 268 seconds) [12:09] *** REiN^ has quit IRC (no.money.no.love) [12:13] *** pie___ has joined #archiveteam-bs [12:13] *** wopls2 has quit IRC (Read error: Operation timed out) [12:57] *** brayden has quit IRC (Ping timeout: 259 seconds) [12:57] *** brayden has joined #archiveteam-bs [12:57] *** swebb sets mode: +o brayden [13:05] *** LFlare43 has joined #archiveteam-bs [13:52] *** REiN^ has joined #archiveteam-bs [13:56] *** VerifiedJ has joined #archiveteam-bs [14:04] *** x0r_ has joined #archiveteam-bs [14:41] *** Martle has quit IRC (Quit: Leaving) [14:42] *** REiN^ has quit IRC (no.money.no.love) [14:42] *** Iridium has joined #archiveteam-bs [14:47] *** wopls2 has joined #archiveteam-bs [14:47] *** pie___ has quit IRC (Read error: Operation timed out) [14:49] *** tsr has quit IRC (Ping timeout: 260 seconds) [15:05] *** tsr has joined #archiveteam-bs [15:55] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [15:56] *** dashcloud has joined #archiveteam-bs [16:29] *** ola_norsk has joined #archiveteam-bs [16:31] Kaz: here's the yt annotations warc i did the other day https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01 [16:32] of the 40k something i grepped into a list from the ids. Lot of them was 404, but that is to be excpected i guess [16:35] after that, i took all the ids, and simply prepended the anotations link to them. Though my computer is such a potato, it took 24 hours just to get grab-site going after inputting the list lol [16:35] cool [16:35] it needs to be set to mediatype web (and possibly moved to a different collection) I think [16:36] I'm sure someone here knows more than I do on that front [16:36] i think i've set "web" mediatype on it already [16:38] you may have done - I thought only certain people could do that [16:38] it's been a while since I've uploaded anything outside of warrior projects [16:42] it seems to work when using "ia" tool and "--metadata=mediatype:web" , but not possible trough their website uploader, at least not for me. [16:50] Anyone can set the mediatype on the initial upload through the ia tool. Only IA admins can change it after the upload. [16:50] As far as I know, anyway. [16:51] ok [16:55] i'm guessing the links that show up with "400 Bad Request" for grab-site is either videos that don't exist anymore, or videos that don't have Annotations added to them (https://i.imgur.com/nEImdHc.png) [16:55] *** Mateon1 has quit IRC (Ping timeout: 255 seconds) [16:56] *** Mateon1 has joined #archiveteam-bs [17:12] *** pie___ has joined #archiveteam-bs [17:12] *** wopls2 has quit IRC (Read error: Operation timed out) [17:46] cf: i forgot, you crawled for the ids in your list? [17:46] cf: or generated them? [17:52] i'm not sure about the id conventions youtube's been using over the years, but maybe id's based on your list can be generated as well, to catch any you've not found yet found by crawling. [17:56] yt stopped accepting annotations may/june of this year somewhere, so there's no real risk of anything being updated or missing if your list is run trough. Any other id's would be "bonus" finds of sorts. [18:02] *** vectr0n_ has joined #archiveteam-bs [18:03] *** vectr0n has quit IRC (Ping timeout: 260 seconds) [18:03] *** vectr0n_ is now known as vectr0n [18:07] *** LFlare43 has quit IRC (Ping timeout: 265 seconds) [18:11] *** LFlare43 has joined #archiveteam-bs [18:16] *** Despatche has joined #archiveteam-bs [18:45] *** Niklink_ has joined #archiveteam-bs [18:48] Sanqui: I belive HadeanEon is also VoynichCr's normal account. But yeah, I agree, this should be a separate account marked as a bot. [18:53] *** prsn has joined #archiveteam-bs [19:00] *** x0r_ has quit IRC (Ping timeout: 262 seconds) [19:01] *** noirscape has joined #archiveteam-bs [19:12] *** PurpleSym sets mode: +oo Sanqui schbirid [19:12] *** Despatche has quit IRC (Remote host closed the connection) [19:12] *** Despatche has joined #archiveteam-bs [19:12] Who is maintaining the wiki right now anyway? [19:26] Regarding tumblr, https://files.catbox.moe/o1di6l.xz ... This is a sort of csv(?) formatted textfile of some scraping I did back in April 2018. It contains around 7m tumblr blogs with their urls, post counts and like counts(?). There is more metadata in it, I forgot about what the other fields were for. One was an indicator for nsfw iirc. [19:27] *** adinbied has quit IRC (Read error: Connection reset by peer) [19:28] *** wp494 has quit IRC (Read error: Operation timed out) [19:28] *** wp494 has joined #archiveteam-bs [19:29] @plue_ #tumbledown [19:30] *** adinbied has joined #archiveteam-bs [20:07] *** moufu_ is now known as moufu [20:12] any adult keyword searches on tumblr returns 0 results. was it always that way? [20:13] marked1: Anything regarding Tumblr in #tumbledown please. [20:14] marked1: that has been the case for several years [20:29] Poor lucas [20:29] He's going to get off scott free [20:29] is this some level of discourse that i'm not on [20:49] JAA: i simply read "Voynich" https://archive.org/search.php?query=Stephen%20Bax [20:50] (i think that guy's got the best take on Voynich manuscript yet to date) [20:51] *** Despatche has quit IRC (Read error: Operation timed out) [20:52] *** Despatche has joined #archiveteam-bs [20:59] i authot to be able to do the YouTube Annotations urls by january, correct ? https://i.imgur.com/W4h2iUN.png [20:59] aught to* [20:59] *** asdf has joined #archiveteam-bs [20:59] if not, could someone do the other halves of the url list? [21:07] *** BlueMax has joined #archiveteam-bs [21:08] Kaz: you mention putting the urls into auchivebot, i'm oploading the list of urls i'm using now to ia. It seems to be a textfile of 1644MB [21:09] Kaz: so it wouldn't be possible to pastebin it etc. [21:09] Pffft, easy enough to break it into pieces and host them somewhere [21:10] Really depends how quickly Google bans IPs [21:11] i'm not banned yet, i do 10 concurrency with delay 10-500ms ..but yeah; that is a worry of mine :D [21:13] *** Despatche has quit IRC (Read error: Operation timed out) [21:14] i doubt they even notice me doing it, but cant concurrency be set with archivebot? [21:16] ...imagine how much easier it would be if there was some way to contact Youtube and say "give the XML's".. [21:18] traffic or requests i doubt they give a shit about, problem is if some dingus notices it and thinks it's some company doing datamining instead of paying for their API [21:37] Kaz: https://archive.org/manage/yt_annotations_xml_urls_cf_based . The url list i use, based on cf's ids and eientei95's (i think) xml link scheme [21:38] https://archive.org/download/yt_annotations_xml_urls_cf_based/yt_anot_urls_nodupcheck.txt [21:41] if that can be fed to archivebot, it would be nice :D [21:43] *** odemgi_ has joined #archiveteam-bs [21:43] *** odemgi has quit IRC (Read error: Connection reset by peer) [21:43] *** fredgido_ has quit IRC (Remote host closed the connection) [21:44] *** fredgido_ has joined #archiveteam-bs [21:46] i'm 131k urls into it by myself (i think) btw [21:47] *** JensRex has joined #archiveteam-bs [21:49] though should grab-site fire off a duplicate check, i bet it would take 12+ hours on my shit archiving pc lol [21:53] *** Jens has quit IRC (Ping timeout: 633 seconds) [21:57] ola_norsk: upload what you've got, and then we can split it. I don't mind doing some [22:00] HCross: https://archive.org/manage/yt_annotations_xml_urls_cf_based , but subtract https://ia601505.us.archive.org/29/items/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01/ola_norsk_yt_anot_urls.txt [22:01] HCross: this is upload so far: https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01 [22:01] *** Despatche has joined #archiveteam-bs [22:01] *** hdch has joined #archiveteam-bs [22:02] *** JensRex has quit IRC (Read error: Connection reset by peer) [22:03] *** Jens has joined #archiveteam-bs [22:04] HCross: that upload is where i grepped from cf's ids, but i messed up, so doing all the ids now, but that link is what i got from the first run [22:07] HCross: just reverse https://archive.org/download/yt_annotations_xml_urls_cf_based/yt_anot_urls_nodupcheck.txt or pick from the middle and down, and it would be good [22:12] *** SketchCow changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | https://twitter.com/textfiles/status/1069715869994020867 [22:14] Kid's are the future [22:17] *** Jens has quit IRC (Ping timeout: 633 seconds) [22:18] *** asdf has quit IRC (Quit: Page closed) [22:19] *** Ryz has joined #archiveteam-bs [22:20] *** Jens has joined #archiveteam-bs [22:20] *** ola_norsk has quit IRC ("Youth is wasted on the young" https://youtu.be/L_VEdra0wUE) [22:41] *** Jens has quit IRC (Remote host closed the connection) [22:42] *** Jens has joined #archiveteam-bs [23:51] *** Niklink_ is now known as Niklink