[01:28] i'm at 563k items now [01:52] *** RichardG has quit IRC (Ping timeout: 499 seconds) [02:01] *** RichardG has joined #archiveteam-bs [02:18] *** RichardG has quit IRC (Ping timeout: 615 seconds) [02:33] do you guys think Kim Dotcom will be extradited to US? [02:41] *** RichardG has joined #archiveteam-bs [02:56] *** RichardG has quit IRC (Ping timeout: 250 seconds) [03:09] *** RichardG has joined #archiveteam-bs [03:59] Probably. [04:08] Turning_Point_Presents_-_Super_Sheep_199x_VHSRip [04:08] http://archive.org/details/Turning_Point_Presents_-_Super_Sheep_199x_VHSRip [04:09] https://archive.org/details/NASA_-_The_First_25_Years_-_Good_Times_Home_Video_1987_VHSRip [04:17] *** ndiddy has quit IRC (Read error: Connection reset by peer) [04:25] *** Nertsy has joined #archiveteam-bs [05:41] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [07:00] https://archive.org/details/The_Making_of_the_Stooges_1984_VHSRip [07:18] *** JesseW has quit IRC (Leaving.) [08:58] *** robink has joined #archiveteam-bs [09:16] *** BlueMaxim has quit IRC (Quit: Leaving) [09:50] *** schbirid has joined #archiveteam-bs [09:55] https://archive.org/details/Breakin_In_The_USA_1984_VHSRip [10:46] *** VADemon has quit IRC (left4dead) [14:32] https://events.ccc.de/congress/2015/wiki/Lightning:Internet_Radio_Recorder [14:33] https://events.ccc.de/congress/2015/wiki/Static:Crawling [15:46] *** marvinw is now known as ivan` [15:48] do IA's massaged URLs (in their CDXes) cause problems in practice? I see that they always lowercase, which could cause problems with things like imgur, but I don't know if I've ever observed problems [15:48] investigating this because I'm going to load a lot of CDXes into a database [15:50] hmm, I guess if you get multiple results for a massaged URL, you can look up an exact-case match [15:58] ivan`: we got the problem with newsgrabber figured out [15:58] it was due to encoding problems [15:58] in this case with the dari language [16:02] *** schbirid has quit IRC (Quit: Leaving) [16:20] *** schbirid has joined #archiveteam-bs [16:30] arkiver: ok if it's a grab-site thing please file a bug [17:04] "This module depends on the tldextract module to query the Public Suffix List. tldextract can be installed via pip" https://github.com/rajbot/surt [17:05] that is worrying to say the least [17:05] what happens when the list changes and SURTs don't match [17:13] https://archive.org/details/We_Are_the_World_-_The_Story_Behind_the_Song_ATV-10_1987 [17:23] oh, it implements some public suffix thing but it's behind a boolean that's always False [17:24] Sketchcow, can you please move the Cryengine files from godane to the IA please [17:37] https://archive.org/details/The_Red_Nose_Express_1987_VHSRip [17:41] *** JesseW has joined #archiveteam-bs [17:57] ivan`: I'll do that [17:57] I found a very strange problem [17:57] ~/.local/bin/grab-site http://www.eqmweekly.com.af/international/8288-???????-??-?????-?????-????? --level=0 --no-sitemaps --concurrency=5 --1 --warc-max-size=524288000 --wpull-args="--no-check-certificate --timeout=300" [17:57] that works [17:58] ~/.local/bin/grab-site http://www.eqmweekly.com.af/technology/8287-???-???????-???-???-??-??????? --level=0 --no-sitemaps --concurrency=5 --1 --warc-max-size=524288000 --wpull-args="--no-check-certificate --timeout=300" [17:58] that does not work [17:59] *** JetBalsa has joined #archiveteam-bs [18:27] *** JesseW has quit IRC (Leaving.) [18:46] *** VADemon has joined #archiveteam-bs [19:05] https://archive.org/details/1994-05-12_David_Copperfield_15_Years_of_Magic [19:29] midas, [19:29] get in #effteepee [19:29] then shout at me [19:57] *** Stilett0 has joined #archiveteam-bs [19:58] *** Stiletto has quit IRC (Read error: Operation timed out) [20:11] *** Stilett0 has quit IRC (Read error: Operation timed out) [20:20] dumping a postgresql database over inflight wifi is not the best experience [20:37] hurp [20:40] ivan`: I have seen wayback return the wrong imgur image if there is a case-insensitive match [20:41] I'm not sure what happens if there are multiple matches, one of which is exact [20:45] I want to make sweet sweet love [20:46] to a womancat [20:54] yipdw, >inflight wifi is not the best experience [20:54] I did indeed write that, yes [21:17] *** BlueMaxim has joined #archiveteam-bs [21:22] https://archive.org/details/1989-07-26_Japan_TV [21:36] Sooooooo [21:37] at some point the FAA will put up a public list [21:37] of all registered drone owners [21:37] .... publically searchable etc [21:37] https://archive.org/details/Fisher-Price_Grimms_Fairy_Tales_-_The_Frog_Prince_1989_VHSRip [22:02] *** xmc has quit IRC (Read error: Operation timed out) [22:02] *** RichardG_ has joined #archiveteam-bs [22:03] *** yakfish has quit IRC (Read error: Operation timed out) [22:03] *** myself has quit IRC (Read error: Operation timed out) [22:03] *** robink has quit IRC (Write error: Broken pipe) [22:03] *** sep332 has quit IRC (Write error: Broken pipe) [22:03] *** beardicus has quit IRC (Read error: Operation timed out) [22:04] *** botpie91 has quit IRC (Read error: Operation timed out) [22:06] *** RichardG has quit IRC (Read error: Operation timed out) [22:09] *** Zebranky has quit IRC (Read error: Operation timed out) [22:09] *** Zebranky has joined #archiveteam-bs [22:09] *** JetBalsa has quit IRC (Read error: Operation timed out) [22:10] *** JetBalsa has joined #archiveteam-bs [22:10] *** rduser has quit IRC (Read error: Operation timed out) [22:10] *** rduser has joined #archiveteam-bs [22:10] https://archive.org/details/In_The_Aftermath_New_World_Entertainment_1988_VHSRip [22:11] *** Sketchcow has quit IRC (Read error: Operation timed out) [22:12] *** is- has quit IRC (Read error: Operation timed out) [22:12] *** is-_ has joined #archiveteam-bs [22:13] *** Baljem_ has quit IRC (Read error: Operation timed out) [22:14] *** Sketchcow has joined #archiveteam-bs [22:14] *** midas sets mode: +o Sketchcow [22:14] *** swebb sets mode: +o Sketchcow [22:14] *** GLaDOS sets mode: +o Sketchcow [22:19] *** Baljem has joined #archiveteam-bs [22:30] *** is-_ is now known as is- [22:30] *** kyan has joined #archiveteam-bs [22:35] *** schbirid has quit IRC (Quit: Leaving) [22:40] *** kyan has quit IRC (Quit: This computer has gone to sleep) [22:45] DFJustin: it looks like it prefers the latest snapshot instead of the exact-case match [22:46] I just contaminated https://news.ycombinator.com/user?id=rms with https://news.ycombinator.com/user?id=RMS in wayback [22:46] I'm probably going to have domain-specific rules for my massaged URLs and re-generate them whenever I add new rules [22:47] even if you priority exact-case matches it's bad UX to tell a user you have something when it's the wrong thing [22:47] prioritize [23:05] arkiver: works for me. I assume you are quoting URLs with question marks if you are dumping them into a shell? [23:19] ivan`: for me only the first one line works. And then I just dump the exact same line as I pasted above in the terminal [23:22] arkiver: can you paste an error? [23:22] sorry, they don't contain question marks [23:22] wait I'll put them up somewhere else [23:24] *** Stiletto has joined #archiveteam-bs [23:25] ivan`: https://ia601500.us.archive.org/35/items/testlinesurls36943/testlines.txt [23:25] you should see some kind of arabic characters [23:26] the first lines works for me only [23:26] heh yes finally an error [23:26] (I see it here) [23:26] the second line gives an 'URL is not printable' error [23:26] ok [23:27] arkiver: I blame wpull. try encoding your input URLs? [23:27] utf-8? [23:27] urlencoding, that is, unicode -> utf-8 -> %XX%XX%XX for the path [23:27] yeah [23:27] sorry, not very into encoding [23:29] I suppose I should either fix this in grab-site or wpull [23:31] seems to be working with encoding them first [23:31] *** botpie91 has joined #archiveteam-bs [23:31] I feel this is more a wpull problem [23:31] *** yakfish has joined #archiveteam-bs [23:32] *** robink has joined #archiveteam-bs [23:33] *** beardicus has joined #archiveteam-bs [23:34] *** sep332 has joined #archiveteam-bs [23:36] *** myself has joined #archiveteam-bs [23:40] *** xmc has joined #archiveteam-bs [23:40] *** swebb sets mode: +o xmc [23:41] I filed a bug for wpull