#archiveteam-bs 2018-12-03,Mon

↑back Search

Time Nickname Message
00:27 🔗 ndiddy has joined #archiveteam-bs
00:33 🔗 Sk1d has quit IRC (Read error: Operation timed out)
00:35 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
00:36 🔗 RichardG has joined #archiveteam-bs
00:36 🔗 Sk1d has joined #archiveteam-bs
00:48 🔗 wopls2 has joined #archiveteam-bs
00:48 🔗 pie___ has quit IRC (Read error: Operation timed out)
00:53 🔗 dashcloud has joined #archiveteam-bs
01:17 🔗 Martle has joined #archiveteam-bs
01:25 🔗 Verified_ has quit IRC (Quit: Leaving)
01:25 🔗 wp494 has quit IRC (Ping timeout: 633 seconds)
01:26 🔗 wp494 has joined #archiveteam-bs
02:10 🔗 qw3rty114 has joined #archiveteam-bs
02:14 🔗 qw3rty113 has quit IRC (Read error: Operation timed out)
03:22 🔗 vitzli has joined #archiveteam-bs
03:43 🔗 adinbied has joined #archiveteam-bs
03:57 🔗 Isanami has joined #archiveteam-bs
04:14 🔗 Isanami has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
04:16 🔗 qw3rty115 has joined #archiveteam-bs
04:17 🔗 odemgi_ has joined #archiveteam-bs
04:19 🔗 qw3rty114 has quit IRC (Read error: Operation timed out)
04:20 🔗 odemgi has quit IRC (Read error: Operation timed out)
04:20 🔗 odemg has quit IRC (Ping timeout: 265 seconds)
04:32 🔗 odemg has joined #archiveteam-bs
04:58 🔗 hdch has quit IRC (Ping timeout: 265 seconds)
05:10 🔗 Mateon1 has quit IRC (Remote host closed the connection)
05:10 🔗 Mateon1 has joined #archiveteam-bs
05:16 🔗 SketchCo1 is now known as SketchCow
05:31 🔗 ndiddy has quit IRC ()
05:38 🔗 fredgido_ has quit IRC (Read error: Connection reset by peer)
05:39 🔗 fredgido_ has joined #archiveteam-bs
06:02 🔗 hdch has joined #archiveteam-bs
06:50 🔗 hook54321 has joined #archiveteam-bs
07:17 🔗 vitzli has quit IRC (Quit: Leaving)
07:43 🔗 schbirid has joined #archiveteam-bs
07:52 🔗 odemgi has joined #archiveteam-bs
07:54 🔗 odemgi_ has quit IRC (Read error: Operation timed out)
07:55 🔗 ranma has joined #archiveteam-bs
08:06 🔗 LFlare43 has quit IRC (Remote host closed the connection)
08:06 🔗 LFlare43 has joined #archiveteam-bs
08:12 🔗 hdch has quit IRC (Remote host closed the connection)
08:14 🔗 godane SketchCow: i'm uploading box1 of vhscover scans
08:15 🔗 godane before i ship tapes not scanned i will see about getting them scanned
08:16 🔗 godane i want to say any tapes before june of this year most likely didn't get scanned
08:16 🔗 hdch has joined #archiveteam-bs
08:16 🔗 godane https://archive.org/details/vhscovers-misc-jason-scott-box1-20181203
10:00 🔗 Lord_Nigh that 35gb font archive jason was talking about on twitter sounds really nifty.
10:26 🔗 wp494 has quit IRC (Read error: Operation timed out)
10:27 🔗 wp494 has joined #archiveteam-bs
10:36 🔗 BlueMax has quit IRC (Remote host closed the connection)
10:37 🔗 BlueMax has joined #archiveteam-bs
11:09 🔗 BlueMax has quit IRC (Quit: Leaving)
11:09 🔗 hdch has quit IRC (Quit: oops)
12:06 🔗 LFlare43 has quit IRC (Ping timeout: 268 seconds)
12:09 🔗 REiN^ has quit IRC (no.money.no.love)
12:13 🔗 pie___ has joined #archiveteam-bs
12:13 🔗 wopls2 has quit IRC (Read error: Operation timed out)
12:57 🔗 brayden has quit IRC (Ping timeout: 259 seconds)
12:57 🔗 brayden has joined #archiveteam-bs
12:57 🔗 swebb sets mode: +o brayden
13:05 🔗 LFlare43 has joined #archiveteam-bs
13:52 🔗 REiN^ has joined #archiveteam-bs
13:56 🔗 VerifiedJ has joined #archiveteam-bs
14:04 🔗 x0r_ has joined #archiveteam-bs
14:41 🔗 Martle has quit IRC (Quit: Leaving)
14:42 🔗 REiN^ has quit IRC (no.money.no.love)
14:42 🔗 Iridium has joined #archiveteam-bs
14:47 🔗 wopls2 has joined #archiveteam-bs
14:47 🔗 pie___ has quit IRC (Read error: Operation timed out)
14:49 🔗 tsr has quit IRC (Ping timeout: 260 seconds)
15:05 🔗 tsr has joined #archiveteam-bs
15:55 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
15:56 🔗 dashcloud has joined #archiveteam-bs
16:29 🔗 ola_norsk has joined #archiveteam-bs
16:31 🔗 ola_norsk Kaz: here's the yt annotations warc i did the other day https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01
16:32 🔗 ola_norsk of the 40k something i grepped into a list from the ids. Lot of them was 404, but that is to be excpected i guess
16:35 🔗 ola_norsk after that, i took all the ids, and simply prepended the anotations link to them. Though my computer is such a potato, it took 24 hours just to get grab-site going after inputting the list lol
16:35 🔗 Kaz cool
16:35 🔗 Kaz it needs to be set to mediatype web (and possibly moved to a different collection) I think
16:36 🔗 Kaz I'm sure someone here knows more than I do on that front
16:36 🔗 ola_norsk i think i've set "web" mediatype on it already
16:38 🔗 Kaz you may have done - I thought only certain people could do that
16:38 🔗 Kaz it's been a while since I've uploaded anything outside of warrior projects
16:42 🔗 ola_norsk it seems to work when using "ia" tool and "--metadata=mediatype:web" , but not possible trough their website uploader, at least not for me.
16:50 🔗 JAA Anyone can set the mediatype on the initial upload through the ia tool. Only IA admins can change it after the upload.
16:50 🔗 JAA As far as I know, anyway.
16:51 🔗 ola_norsk ok
16:55 🔗 ola_norsk i'm guessing the links that show up with "400 Bad Request" for grab-site is either videos that don't exist anymore, or videos that don't have Annotations added to them (https://i.imgur.com/nEImdHc.png)
16:55 🔗 Mateon1 has quit IRC (Ping timeout: 255 seconds)
16:56 🔗 Mateon1 has joined #archiveteam-bs
17:12 🔗 pie___ has joined #archiveteam-bs
17:12 🔗 wopls2 has quit IRC (Read error: Operation timed out)
17:46 🔗 ola_norsk cf: i forgot, you crawled for the ids in your list?
17:46 🔗 ola_norsk cf: or generated them?
17:52 🔗 ola_norsk i'm not sure about the id conventions youtube's been using over the years, but maybe id's based on your list can be generated as well, to catch any you've not found yet found by crawling.
17:56 🔗 ola_norsk yt stopped accepting annotations may/june of this year somewhere, so there's no real risk of anything being updated or missing if your list is run trough. Any other id's would be "bonus" finds of sorts.
18:02 🔗 vectr0n_ has joined #archiveteam-bs
18:03 🔗 vectr0n has quit IRC (Ping timeout: 260 seconds)
18:03 🔗 vectr0n_ is now known as vectr0n
18:07 🔗 LFlare43 has quit IRC (Ping timeout: 265 seconds)
18:11 🔗 LFlare43 has joined #archiveteam-bs
18:16 🔗 Despatche has joined #archiveteam-bs
18:45 🔗 Niklink_ has joined #archiveteam-bs
18:48 🔗 JAA Sanqui: I belive HadeanEon is also VoynichCr's normal account. But yeah, I agree, this should be a separate account marked as a bot.
18:53 🔗 prsn has joined #archiveteam-bs
19:00 🔗 x0r_ has quit IRC (Ping timeout: 262 seconds)
19:01 🔗 noirscape has joined #archiveteam-bs
19:12 🔗 PurpleSym sets mode: +oo Sanqui schbirid
19:12 🔗 Despatche has quit IRC (Remote host closed the connection)
19:12 🔗 Despatche has joined #archiveteam-bs
19:12 🔗 PurpleSym Who is maintaining the wiki right now anyway?
19:26 🔗 plue_ Regarding tumblr, https://files.catbox.moe/o1di6l.xz ... This is a sort of csv(?) formatted textfile of some scraping I did back in April 2018. It contains around 7m tumblr blogs with their urls, post counts and like counts(?). There is more metadata in it, I forgot about what the other fields were for. One was an indicator for nsfw iirc.
19:27 🔗 adinbied has quit IRC (Read error: Connection reset by peer)
19:28 🔗 wp494 has quit IRC (Read error: Operation timed out)
19:28 🔗 wp494 has joined #archiveteam-bs
19:29 🔗 Niklink_ @plue_ #tumbledown
19:30 🔗 adinbied has joined #archiveteam-bs
20:07 🔗 moufu_ is now known as moufu
20:12 🔗 marked1 any adult keyword searches on tumblr returns 0 results. was it always that way?
20:13 🔗 JAA marked1: Anything regarding Tumblr in #tumbledown please.
20:14 🔗 astrid marked1: that has been the case for several years
20:29 🔗 SketchCow Poor lucas
20:29 🔗 SketchCow He's going to get off scott free
20:29 🔗 astrid is this some level of discourse that i'm not on
20:49 🔗 ola_norsk JAA: i simply read "Voynich" https://archive.org/search.php?query=Stephen%20Bax
20:50 🔗 ola_norsk (i think that guy's got the best take on Voynich manuscript yet to date)
20:51 🔗 Despatche has quit IRC (Read error: Operation timed out)
20:52 🔗 Despatche has joined #archiveteam-bs
20:59 🔗 ola_norsk i authot to be able to do the YouTube Annotations urls by january, correct ? https://i.imgur.com/W4h2iUN.png
20:59 🔗 ola_norsk aught to*
20:59 🔗 asdf has joined #archiveteam-bs
20:59 🔗 ola_norsk if not, could someone do the other halves of the url list?
21:07 🔗 BlueMax has joined #archiveteam-bs
21:08 🔗 ola_norsk Kaz: you mention putting the urls into auchivebot, i'm oploading the list of urls i'm using now to ia. It seems to be a textfile of 1644MB
21:09 🔗 ola_norsk Kaz: so it wouldn't be possible to pastebin it etc.
21:09 🔗 Kaz Pffft, easy enough to break it into pieces and host them somewhere
21:10 🔗 Kaz Really depends how quickly Google bans IPs
21:11 🔗 ola_norsk i'm not banned yet, i do 10 concurrency with delay 10-500ms ..but yeah; that is a worry of mine :D
21:13 🔗 Despatche has quit IRC (Read error: Operation timed out)
21:14 🔗 ola_norsk i doubt they even notice me doing it, but cant concurrency be set with archivebot?
21:16 🔗 ola_norsk ...imagine how much easier it would be if there was some way to contact Youtube and say "give the XML's"..
21:18 🔗 ola_norsk traffic or requests i doubt they give a shit about, problem is if some dingus notices it and thinks it's some company doing datamining instead of paying for their API
21:37 🔗 ola_norsk Kaz: https://archive.org/manage/yt_annotations_xml_urls_cf_based . The url list i use, based on cf's ids and eientei95's (i think) xml link scheme
21:38 🔗 ola_norsk https://archive.org/download/yt_annotations_xml_urls_cf_based/yt_anot_urls_nodupcheck.txt
21:41 🔗 ola_norsk if that can be fed to archivebot, it would be nice :D
21:43 🔗 odemgi_ has joined #archiveteam-bs
21:43 🔗 odemgi has quit IRC (Read error: Connection reset by peer)
21:43 🔗 fredgido_ has quit IRC (Remote host closed the connection)
21:44 🔗 fredgido_ has joined #archiveteam-bs
21:46 🔗 ola_norsk i'm 131k urls into it by myself (i think) btw
21:47 🔗 JensRex has joined #archiveteam-bs
21:49 🔗 ola_norsk though should grab-site fire off a duplicate check, i bet it would take 12+ hours on my shit archiving pc lol
21:53 🔗 Jens has quit IRC (Ping timeout: 633 seconds)
21:57 🔗 HCross ola_norsk: upload what you've got, and then we can split it. I don't mind doing some
22:00 🔗 ola_norsk HCross: https://archive.org/manage/yt_annotations_xml_urls_cf_based , but subtract https://ia601505.us.archive.org/29/items/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01/ola_norsk_yt_anot_urls.txt
22:01 🔗 ola_norsk HCross: this is upload so far: https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01
22:01 🔗 Despatche has joined #archiveteam-bs
22:01 🔗 hdch has joined #archiveteam-bs
22:02 🔗 JensRex has quit IRC (Read error: Connection reset by peer)
22:03 🔗 Jens has joined #archiveteam-bs
22:04 🔗 ola_norsk HCross: that upload is where i grepped from cf's ids, but i messed up, so doing all the ids now, but that link is what i got from the first run
22:07 🔗 ola_norsk HCross: just reverse https://archive.org/download/yt_annotations_xml_urls_cf_based/yt_anot_urls_nodupcheck.txt or pick from the middle and down, and it would be good
22:12 🔗 SketchCow changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | https://twitter.com/textfiles/status/1069715869994020867
22:14 🔗 ola_norsk Kid's are the future
22:17 🔗 Jens has quit IRC (Ping timeout: 633 seconds)
22:18 🔗 asdf has quit IRC (Quit: Page closed)
22:19 🔗 Ryz has joined #archiveteam-bs
22:20 🔗 Jens has joined #archiveteam-bs
22:20 🔗 ola_norsk has quit IRC ("Youth is wasted on the young" https://youtu.be/L_VEdra0wUE)
22:41 🔗 Jens has quit IRC (Remote host closed the connection)
22:42 🔗 Jens has joined #archiveteam-bs
23:51 🔗 Niklink_ is now known as Niklink

irclogger-viewer