#archiveteam-bs 2015-12-26,Sat

↑back Search

Time Nickname Message
01:28 🔗 godane i'm at 563k items now
01:52 🔗 RichardG has quit IRC (Ping timeout: 499 seconds)
02:01 🔗 RichardG has joined #archiveteam-bs
02:18 🔗 RichardG has quit IRC (Ping timeout: 615 seconds)
02:33 🔗 Ravenloft do you guys think Kim Dotcom will be extradited to US?
02:41 🔗 RichardG has joined #archiveteam-bs
02:56 🔗 RichardG has quit IRC (Ping timeout: 250 seconds)
03:09 🔗 RichardG has joined #archiveteam-bs
03:59 🔗 Sketchcow Probably.
04:08 🔗 godane Turning_Point_Presents_-_Super_Sheep_199x_VHSRip
04:08 🔗 godane http://archive.org/details/Turning_Point_Presents_-_Super_Sheep_199x_VHSRip
04:09 🔗 godane https://archive.org/details/NASA_-_The_First_25_Years_-_Good_Times_Home_Video_1987_VHSRip
04:17 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
04:25 🔗 Nertsy has joined #archiveteam-bs
05:41 🔗 JetBalsa has quit IRC (Read error: Connection reset by peer)
07:00 🔗 godane https://archive.org/details/The_Making_of_the_Stooges_1984_VHSRip
07:18 🔗 JesseW has quit IRC (Leaving.)
08:58 🔗 robink has joined #archiveteam-bs
09:16 🔗 BlueMaxim has quit IRC (Quit: Leaving)
09:50 🔗 schbirid has joined #archiveteam-bs
09:55 🔗 godane https://archive.org/details/Breakin_In_The_USA_1984_VHSRip
10:46 🔗 VADemon has quit IRC (left4dead)
14:32 🔗 schbirid https://events.ccc.de/congress/2015/wiki/Lightning:Internet_Radio_Recorder
14:33 🔗 schbirid https://events.ccc.de/congress/2015/wiki/Static:Crawling
15:46 🔗 marvinw is now known as ivan`
15:48 🔗 ivan` do IA's massaged URLs (in their CDXes) cause problems in practice? I see that they always lowercase, which could cause problems with things like imgur, but I don't know if I've ever observed problems
15:48 🔗 ivan` investigating this because I'm going to load a lot of CDXes into a database
15:50 🔗 ivan` hmm, I guess if you get multiple results for a massaged URL, you can look up an exact-case match
15:58 🔗 arkiver ivan`: we got the problem with newsgrabber figured out
15:58 🔗 arkiver it was due to encoding problems
15:58 🔗 arkiver in this case with the dari language
16:02 🔗 schbirid has quit IRC (Quit: Leaving)
16:20 🔗 schbirid has joined #archiveteam-bs
16:30 🔗 ivan` arkiver: ok if it's a grab-site thing please file a bug
17:04 🔗 ivan` "This module depends on the tldextract module to query the Public Suffix List. tldextract can be installed via pip" https://github.com/rajbot/surt
17:05 🔗 ivan` that is worrying to say the least
17:05 🔗 ivan` what happens when the list changes and SURTs don't match
17:13 🔗 godane https://archive.org/details/We_Are_the_World_-_The_Story_Behind_the_Song_ATV-10_1987
17:23 🔗 ivan` oh, it implements some public suffix thing but it's behind a boolean that's always False
17:24 🔗 HCross Sketchcow, can you please move the Cryengine files from godane to the IA please
17:37 🔗 godane https://archive.org/details/The_Red_Nose_Express_1987_VHSRip
17:41 🔗 JesseW has joined #archiveteam-bs
17:57 🔗 arkiver ivan`: I'll do that
17:57 🔗 arkiver I found a very strange problem
17:57 🔗 arkiver ~/.local/bin/grab-site http://www.eqmweekly.com.af/international/8288-???????-??-?????-?????-????? --level=0 --no-sitemaps --concurrency=5 --1 --warc-max-size=524288000 --wpull-args="--no-check-certificate --timeout=300"
17:57 🔗 arkiver that works
17:58 🔗 arkiver ~/.local/bin/grab-site http://www.eqmweekly.com.af/technology/8287-???-???????-???-???-??-??????? --level=0 --no-sitemaps --concurrency=5 --1 --warc-max-size=524288000 --wpull-args="--no-check-certificate --timeout=300"
17:58 🔗 arkiver that does not work
17:59 🔗 JetBalsa has joined #archiveteam-bs
18:27 🔗 JesseW has quit IRC (Leaving.)
18:46 🔗 VADemon has joined #archiveteam-bs
19:05 🔗 godane https://archive.org/details/1994-05-12_David_Copperfield_15_Years_of_Magic
19:29 🔗 ohhdemgir midas,
19:29 🔗 ohhdemgir get in #effteepee
19:29 🔗 ohhdemgir then shout at me
19:57 🔗 Stilett0 has joined #archiveteam-bs
19:58 🔗 Stiletto has quit IRC (Read error: Operation timed out)
20:11 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
20:20 🔗 yipdw dumping a postgresql database over inflight wifi is not the best experience
20:37 🔗 CatButts hurp
20:40 🔗 DFJustin ivan`: I have seen wayback return the wrong imgur image if there is a case-insensitive match
20:41 🔗 DFJustin I'm not sure what happens if there are multiple matches, one of which is exact
20:45 🔗 CatButts I want to make sweet sweet love
20:46 🔗 CatButts to a womancat
20:54 🔗 ohhdemgir yipdw, >inflight wifi is not the best experience
20:54 🔗 yipdw I did indeed write that, yes
21:17 🔗 BlueMaxim has joined #archiveteam-bs
21:22 🔗 godane https://archive.org/details/1989-07-26_Japan_TV
21:36 🔗 SmileyG Sooooooo
21:37 🔗 SmileyG at some point the FAA will put up a public list
21:37 🔗 SmileyG of all registered drone owners
21:37 🔗 SmileyG .... publically searchable etc
21:37 🔗 godane https://archive.org/details/Fisher-Price_Grimms_Fairy_Tales_-_The_Frog_Prince_1989_VHSRip
22:02 🔗 xmc has quit IRC (Read error: Operation timed out)
22:02 🔗 RichardG_ has joined #archiveteam-bs
22:03 🔗 yakfish has quit IRC (Read error: Operation timed out)
22:03 🔗 myself has quit IRC (Read error: Operation timed out)
22:03 🔗 robink has quit IRC (Write error: Broken pipe)
22:03 🔗 sep332 has quit IRC (Write error: Broken pipe)
22:03 🔗 beardicus has quit IRC (Read error: Operation timed out)
22:04 🔗 botpie91 has quit IRC (Read error: Operation timed out)
22:06 🔗 RichardG has quit IRC (Read error: Operation timed out)
22:09 🔗 Zebranky has quit IRC (Read error: Operation timed out)
22:09 🔗 Zebranky has joined #archiveteam-bs
22:09 🔗 JetBalsa has quit IRC (Read error: Operation timed out)
22:10 🔗 JetBalsa has joined #archiveteam-bs
22:10 🔗 rduser has quit IRC (Read error: Operation timed out)
22:10 🔗 rduser has joined #archiveteam-bs
22:10 🔗 godane https://archive.org/details/In_The_Aftermath_New_World_Entertainment_1988_VHSRip
22:11 🔗 Sketchcow has quit IRC (Read error: Operation timed out)
22:12 🔗 is- has quit IRC (Read error: Operation timed out)
22:12 🔗 is-_ has joined #archiveteam-bs
22:13 🔗 Baljem_ has quit IRC (Read error: Operation timed out)
22:14 🔗 Sketchcow has joined #archiveteam-bs
22:14 🔗 midas sets mode: +o Sketchcow
22:14 🔗 swebb sets mode: +o Sketchcow
22:14 🔗 GLaDOS sets mode: +o Sketchcow
22:19 🔗 Baljem has joined #archiveteam-bs
22:30 🔗 is-_ is now known as is-
22:30 🔗 kyan has joined #archiveteam-bs
22:35 🔗 schbirid has quit IRC (Quit: Leaving)
22:40 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
22:45 🔗 ivan` DFJustin: it looks like it prefers the latest snapshot instead of the exact-case match
22:46 🔗 ivan` I just contaminated https://news.ycombinator.com/user?id=rms with https://news.ycombinator.com/user?id=RMS in wayback
22:46 🔗 ivan` I'm probably going to have domain-specific rules for my massaged URLs and re-generate them whenever I add new rules
22:47 🔗 ivan` even if you priority exact-case matches it's bad UX to tell a user you have something when it's the wrong thing
22:47 🔗 ivan` prioritize
23:05 🔗 ivan` arkiver: works for me. I assume you are quoting URLs with question marks if you are dumping them into a shell?
23:19 🔗 arkiver ivan`: for me only the first one line works. And then I just dump the exact same line as I pasted above in the terminal
23:22 🔗 ivan` arkiver: can you paste an error?
23:22 🔗 arkiver sorry, they don't contain question marks
23:22 🔗 arkiver wait I'll put them up somewhere else
23:24 🔗 Stiletto has joined #archiveteam-bs
23:25 🔗 arkiver ivan`: https://ia601500.us.archive.org/35/items/testlinesurls36943/testlines.txt
23:25 🔗 arkiver you should see some kind of arabic characters
23:26 🔗 arkiver the first lines works for me only
23:26 🔗 ivan` heh yes finally an error
23:26 🔗 ivan` (I see it here)
23:26 🔗 arkiver the second line gives an 'URL is not printable' error
23:26 🔗 arkiver ok
23:27 🔗 ivan` arkiver: I blame wpull. try encoding your input URLs?
23:27 🔗 arkiver utf-8?
23:27 🔗 ivan` urlencoding, that is, unicode -> utf-8 -> %XX%XX%XX for the path
23:27 🔗 arkiver yeah
23:27 🔗 arkiver sorry, not very into encoding
23:29 🔗 ivan` I suppose I should either fix this in grab-site or wpull
23:31 🔗 arkiver seems to be working with encoding them first
23:31 🔗 botpie91 has joined #archiveteam-bs
23:31 🔗 arkiver I feel this is more a wpull problem
23:31 🔗 yakfish has joined #archiveteam-bs
23:32 🔗 robink has joined #archiveteam-bs
23:33 🔗 beardicus has joined #archiveteam-bs
23:34 🔗 sep332 has joined #archiveteam-bs
23:36 🔗 myself has joined #archiveteam-bs
23:40 🔗 xmc has joined #archiveteam-bs
23:40 🔗 swebb sets mode: +o xmc
23:41 🔗 arkiver I filed a bug for wpull

irclogger-viewer