#archiveteam 2014-12-23,Tue

↑back Search

Time Nickname Message
00:00 🔗 Fusl i have to keep the server until july anyway
00:00 🔗 Fusl will just downgrade ram, cpu, ... to make it a little bit cheaper
00:00 🔗 arkiver ok
00:00 🔗 arkiver I probably have everything finished in a few weeks
00:04 🔗 DFJustin has quit IRC (Ping timeout: 740 seconds)
00:07 🔗 DFJustin has joined #archiveteam
00:07 🔗 swebb sets mode: +o DFJustin
00:16 🔗 maltris has quit IRC (Ping timeout: 246 seconds)
00:22 🔗 maltris has joined #archiveteam
00:46 🔗 aaaaaaaa_ has joined #archiveteam
00:53 🔗 aaaaaaaaa has quit IRC (Read error: Operation timed out)
00:54 🔗 aaaaaaaa_ is now known as aaaaaaaaa
01:05 🔗 Ravenloft has quit IRC (Remote host closed the connection)
01:17 🔗 ivan`_ has quit IRC (Ping timeout: 250 seconds)
01:19 🔗 ivan` has joined #archiveteam
01:22 🔗 Sellyme_ has joined #archiveteam
01:23 🔗 fenn_ has joined #archiveteam
01:26 🔗 danneh_ has quit IRC (hub.se efnet.port80.se)
01:26 🔗 fenn has quit IRC (hub.se efnet.port80.se)
01:26 🔗 thechip has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Nemo_bis has quit IRC (hub.se efnet.port80.se)
01:26 🔗 GLaDOS has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Sellyme has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Kazzy has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Stary2001 has quit IRC (hub.se efnet.port80.se)
01:26 🔗 fresco___ has quit IRC (hub.se efnet.port80.se)
01:26 🔗 fluff_ has quit IRC (hub.se efnet.port80.se)
01:26 🔗 RainbowCo has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Muad-Dib has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Kniffy has quit IRC (hub.se efnet.port80.se)
01:26 🔗 lhobas has quit IRC (hub.se efnet.port80.se)
01:26 🔗 parsons has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Shank___ has quit IRC (hub.se efnet.port80.se)
01:26 🔗 deathy has quit IRC (hub.se efnet.port80.se)
01:26 🔗 VonScoot has quit IRC (hub.se efnet.port80.se)
01:26 🔗 Riviera has quit IRC (hub.se efnet.port80.se)
01:27 🔗 Kazzy_ has joined #archiveteam
01:41 🔗 Kazzy_ is now known as Kazzy
01:50 🔗 godane has quit IRC (Ping timeout: 615 seconds)
01:51 🔗 godane has joined #archiveteam
02:03 🔗 sep332 has quit IRC (Ping timeout: 615 seconds)
02:03 🔗 sep332 has joined #archiveteam
02:06 🔗 Nertsy has quit IRC (Read error: Operation timed out)
02:06 🔗 pft has quit IRC (Read error: Operation timed out)
02:06 🔗 Jogie_ has quit IRC (Read error: Operation timed out)
02:08 🔗 Nertsy has joined #archiveteam
02:10 🔗 khaoohs has quit IRC (Read error: Operation timed out)
02:11 🔗 mistym has quit IRC (hub.efnet.us irc.paraphysics.net)
02:11 🔗 nertzy has quit IRC (hub.efnet.us irc.paraphysics.net)
02:11 🔗 phuzion has quit IRC (hub.efnet.us irc.paraphysics.net)
02:11 🔗 Sue_ has quit IRC (hub.efnet.us irc.paraphysics.net)
02:14 🔗 khaoohs_ has joined #archiveteam
02:14 🔗 mistym has joined #archiveteam
02:14 🔗 nertzy has joined #archiveteam
02:14 🔗 phuzion has joined #archiveteam
02:14 🔗 Sue_ has joined #archiveteam
02:16 🔗 Jogie has joined #archiveteam
02:17 🔗 primus104 has quit IRC (Read error: Operation timed out)
02:17 🔗 primus has quit IRC (Read error: Operation timed out)
02:17 🔗 wp494 has quit IRC ()
02:17 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:17 🔗 dashcloud has joined #archiveteam
02:17 🔗 primus has joined #archiveteam
02:22 🔗 primus104 has joined #archiveteam
02:24 🔗 khaoohs__ has joined #archiveteam
02:24 🔗 khaoohs_ has quit IRC (Read error: Connection reset by peer)
02:29 🔗 nertzy2 has joined #archiveteam
02:30 🔗 phuzion_ has joined #archiveteam
02:31 🔗 Sue_ has quit IRC (Ping timeout: 246 seconds)
02:31 🔗 Sue_ has joined #archiveteam
02:33 🔗 phuzion has quit IRC (Read error: Connection reset by peer)
02:35 🔗 Sue_ has quit IRC (Ping timeout: 246 seconds)
02:36 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
02:36 🔗 nertzy has quit IRC (Ping timeout: 615 seconds)
02:36 🔗 Froggypwn has joined #archiveteam
02:39 🔗 ruukasu has joined #archiveteam
02:39 🔗 Start has joined #archiveteam
02:45 🔗 wp494 has joined #archiveteam
02:50 🔗 Sue_ has joined #archiveteam
02:50 🔗 human39 has quit IRC (Leaving)
03:05 🔗 Start has quit IRC (Ping timeout: 480 seconds)
03:07 🔗 Start has joined #archiveteam
03:13 🔗 rejon has joined #archiveteam
03:19 🔗 Start has quit IRC (Quit: Leaving)
03:54 🔗 primus104 has quit IRC (Leaving.)
04:16 🔗 pft has joined #archiveteam
04:48 🔗 Ymgve has quit IRC ()
05:03 🔗 mistym has quit IRC (Remote host closed the connection)
05:05 🔗 aaaaaaaaa has quit IRC (Leaving)
05:06 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:10 🔗 dashcloud has joined #archiveteam
05:15 🔗 rejon has quit IRC (Ping timeout: 480 seconds)
05:17 🔗 pft has quit IRC (ny.us.hub irc.paraphysics.net)
05:17 🔗 Sue_ has quit IRC (ny.us.hub irc.paraphysics.net)
05:20 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:23 🔗 dashcloud has joined #archiveteam
05:27 🔗 pft has joined #archiveteam
05:27 🔗 Sue_ has joined #archiveteam
05:29 🔗 brayden has quit IRC (Read error: Operation timed out)
05:29 🔗 mistym has joined #archiveteam
05:42 🔗 GLaDOS has joined #archiveteam
05:42 🔗 swebb sets mode: +o GLaDOS
05:42 🔗 brayden has joined #archiveteam
05:43 🔗 fluff_ has joined #archiveteam
05:51 🔗 APerti has joined #archiveteam
06:27 🔗 trs80 has quit IRC (hub.efnet.us irc.umich.edu)
06:41 🔗 trs80 has joined #archiveteam
06:47 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
07:01 🔗 thefox has joined #archiveteam
07:05 🔗 the_fox has quit IRC (Read error: Operation timed out)
07:08 🔗 ruukasu has joined #archiveteam
07:55 🔗 dashcloud has quit IRC (Read error: Operation timed out)
07:58 🔗 dashcloud has joined #archiveteam
08:14 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
08:40 🔗 brayden has quit IRC (Ping timeout: 606 seconds)
08:43 🔗 brayden has joined #archiveteam
08:45 🔗 primus104 has joined #archiveteam
08:49 🔗 signius has joined #archiveteam
09:04 🔗 ruukasu has joined #archiveteam
09:05 🔗 schbirid has joined #archiveteam
09:26 🔗 brayden has quit IRC (Read error: Operation timed out)
09:33 🔗 mistym has quit IRC (Remote host closed the connection)
09:48 🔗 brayden has joined #archiveteam
10:15 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:17 🔗 APerti has quit IRC (Ping timeout: 370 seconds)
10:19 🔗 APerti has joined #archiveteam
10:27 🔗 okeuday has quit IRC (Ping timeout: 480 seconds)
10:38 🔗 okeuday has joined #archiveteam
11:11 🔗 Riviera has joined #archiveteam
11:21 🔗 APerti has quit IRC (Ping timeout: 370 seconds)
11:34 🔗 danneh_ has joined #archiveteam
11:44 🔗 Ymgve has joined #archiveteam
12:12 🔗 deathy has joined #archiveteam
12:39 🔗 primus104 has quit IRC (Leaving.)
13:08 🔗 signius has quit IRC (Read error: Operation timed out)
13:20 🔗 signius has joined #archiveteam
13:47 🔗 eprillios has quit IRC (Read error: Operation timed out)
14:16 🔗 APerti has joined #archiveteam
14:25 🔗 APerti has quit IRC (Ping timeout: 370 seconds)
14:36 🔗 SadDM has quit IRC (leaving)
14:36 🔗 SadDM has joined #archiveteam
14:38 🔗 sankin has joined #archiveteam
14:40 🔗 Start has joined #archiveteam
14:53 🔗 thechip_ has joined #archiveteam
14:53 🔗 SadDM has quit IRC (leaving)
14:55 🔗 SadDM has joined #archiveteam
14:55 🔗 swebb sets mode: +o SadDM
14:58 🔗 Start http://computing.vt.edu/kb/entry/3997
14:59 🔗 SadDM has left
14:59 🔗 Start The Filebox service will be shut down on December 31, 2014. After December 31, 2014, you will no longer be able to access files located on Filebox, so download a copy of your files and Web sites now.
15:08 🔗 SadDM has joined #archiveteam
15:08 🔗 swebb sets mode: +o SadDM
15:09 🔗 SadDM has left
15:13 🔗 SadDM has joined #archiveteam
15:13 🔗 swebb sets mode: +o SadDM
15:14 🔗 Start has quit IRC (Ping timeout: 606 seconds)
15:14 🔗 SadDM has left
15:18 🔗 SadDM has joined #archiveteam
15:18 🔗 swebb sets mode: +o SadDM
15:29 🔗 primus104 has joined #archiveteam
15:30 🔗 Start has joined #archiveteam
15:31 🔗 Start has quit IRC (Client Quit)
15:54 🔗 db48x has joined #archiveteam
16:29 🔗 aaaaaaaaa has joined #archiveteam
16:38 🔗 joepie91 well, at least it was almost 2 months ahead of time...
16:38 🔗 joepie91 does filebox have any public data?
16:40 🔗 Kazzy seems to have public sites, under http://filebox.vt.edu/users/
16:40 🔗 Kazzy http://filebox.vt.edu/users/cdgorman/
16:43 🔗 SadDM has left
16:43 🔗 joepie91 oh god
16:43 🔗 joepie91 https://www.google.nl/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site%3Afilebox.vt.edu%20inurl%3Ausers
16:43 🔗 joepie91 that does not look good...
16:44 🔗 joepie91 Kazzy: wat do?
16:44 🔗 Kazzy well, 21.3k results.. that feels like a warrior project
16:45 🔗 SadDM has joined #archiveteam
16:45 🔗 swebb sets mode: +o SadDM
16:48 🔗 schbirid censorship? :) https://filebox.vt.edu/robots.txt
16:48 🔗 SadDM has left
16:48 🔗 SadDM has joined #archiveteam
16:48 🔗 swebb sets mode: +o SadDM
16:49 🔗 SadDM has quit IRC (Client Quit)
16:49 🔗 Kazzy I'd noticed that yeah, they all return 404's though..
16:49 🔗 SadDM has joined #archiveteam
16:49 🔗 swebb sets mode: +o SadDM
16:49 🔗 schbirid "The default quota (space limitation) for Filebox is currently 30 MB."
16:50 🔗 schbirid https://computing.vt.edu/content/filebox-documentation#space
16:56 🔗 SadDM has quit IRC (leaving)
16:57 🔗 SadDM has joined #archiveteam
16:57 🔗 swebb sets mode: +o SadDM
17:01 🔗 db48x has quit IRC (Ping timeout: 258 seconds)
17:02 🔗 joepie91 we would have like a week for a warrior project
17:02 🔗 joepie91 and I'll be going to 31c3, so I have no time for it
17:02 🔗 joepie91 :/
17:04 🔗 schbirid oh noes, does that mean we will have to meet? D: D: D:
17:04 🔗 schbirid :)
17:04 🔗 schbirid nico: you coming again too?
17:05 🔗 joepie91 schbirid: yes, obviously! :P
17:05 🔗 joepie91 archiveteam meeting required
17:05 🔗 eprillios has joined #archiveteam
17:05 🔗 schbirid :)
17:05 🔗 joepie91 also whoop whoop whoop offtopic siren
17:05 🔗 schbirid absolutely
17:05 🔗 schbirid oops
17:09 🔗 Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~)
17:09 🔗 Kazzy if someone can get a google/bing/whatever scrape for filebox.vt.edu URLs, we could possibly get somewhere
17:20 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:24 🔗 dashcloud has joined #archiveteam
17:58 🔗 arkiver SketchCow: 500.000 items will be added to the clip art project. FOS is the target
18:26 🔗 primus104 has quit IRC (Leaving.)
18:28 🔗 Elegance has quit IRC (Quit: :(){ :|:& };:)
18:41 🔗 asd_ has joined #archiveteam
18:41 🔗 asd_ has quit IRC (Client Quit)
18:45 🔗 Emcy_ has joined #archiveteam
18:47 🔗 Emcy has quit IRC (Read error: Operation timed out)
18:50 🔗 bzc6p__ has joined #archiveteam
18:50 🔗 bzc6p__ is now known as bzc6p
18:51 🔗 bzc6p I've started scraping Google
18:51 🔗 bzc6p for filebox.vt.edu
18:51 🔗 mistym has joined #archiveteam
18:52 🔗 arkiver bzc6p: do you think we need a project for it?
18:52 🔗 bzc6p arkiver: I don't think anything, Kazzy just said we need a scrape and I started to scrape.
18:53 🔗 bzc6p "I've no idea what it is, but I'm helping in archiving it"
18:53 🔗 Kazzy arkiver: we're looking at 21.3k results from google
18:53 🔗 Kazzy maximum filesize 30MB
18:53 🔗 Kazzy bzc6p: thanks for the scrape, was busy earlier :)
18:54 🔗 bzc6p I'll just spit out an url list soon.
18:55 🔗 Elegance has joined #archiveteam
18:56 🔗 arkiver so filebox profiles are made like http://filebox.vt.edu/users/*user*/
18:57 🔗 arkiver might be possible to do with archivebot
18:57 🔗 arkiver provide a !a < list
18:57 🔗 Kazzy from the results i've seen, yet
18:57 🔗 Kazzy it's possible with archivebot, we have a week
18:58 🔗 arkiver yeah
19:03 🔗 arkiver microsoft clip art is running
19:03 🔗 arkiver next up: Nokia Memories
19:11 🔗 db48x has joined #archiveteam
19:33 🔗 bzc6p arkiver: regarding filebox, a user's site can be reached at least three ways.
19:34 🔗 arkiver hmm?
19:34 🔗 bzc6p e.g. for "rmtaylor" all the followings work:
19:34 🔗 bzc6p /users/rmtaylor
19:34 🔗 bzc6p /r/rmtaylor
19:34 🔗 bzc6p /~rmtaylor
19:35 🔗 bzc6p I guess we reach the same files each way, but how many broken links do we leave behind?
19:36 🔗 bzc6p However, grabbing everything in each 3 ways would be practically doing the same thing 3 times.
19:37 🔗 arkiver If the whole thing would not be too big in size and the websites is fast enough, we'll do all three ways
19:37 🔗 arkiver I guess a warrior project would be good for this then
19:37 🔗 bzc6p And no, it doesn't seem to be redirecting (the url in my browser remains the same path)
19:38 🔗 Kazzy can't we just process that afterwards, once we've grabbed one copy?
19:38 🔗 arkiver then it could be done with archivebot too
19:38 🔗 arkiver Kazzy: can do that, but if we have enough time and site can handle it, it's better to do the three ways at the start
19:41 🔗 bzc6p arkiver: you may have misunderstood me. So the situation is NOT that two redirect to a third one. It appears like if they all existed separately.
19:41 🔗 arkiver Yes I know
19:42 🔗 bzc6p Okay, I wasn't sure.
19:42 🔗 arkiver so that can be grabbed with archivebot, if that is fast enough
19:42 🔗 arkiver just three different urls for each found url
19:42 🔗 bzc6p Hopefully just three.
19:44 🔗 aaaaaaaaa well, archivebot had the deduplicates, although I'm not sure how that works in the warc
19:45 🔗 aaaaaaaaa err, archivebot has a spotter for duplicates, but I'm not sure how that looks in the warc.
19:46 🔗 bzc6p arkiver: do we suppose that a all of a user's files is reachable from its site, or shall I also create a list of individual files listed by Google?
19:47 🔗 arkiver bzc6p: from what I have seen, not all files are linked to from the main page
19:47 🔗 arkiver so it would be best to have the full list of individual urls
19:52 🔗 bzc6p So I've got a list of URLs from Google and Common Crawl. (I don't speak Bing.) Now I'll create a list of user main pages and the found individual files in all three ways.
19:53 🔗 sankin has quit IRC (Leaving.)
20:00 🔗 BlueMaxim has joined #archiveteam
20:02 🔗 godane looks like archiveteam doesn't display sub-collection anymore
20:02 🔗 godane SketchCow: i would like to know why
20:03 🔗 primus104 has joined #archiveteam
20:03 🔗 SketchCow ha ha
20:03 🔗 SketchCow So, something is broken.
20:03 🔗 SketchCow I've been raising it with the devs
20:07 🔗 godane i think if you edit archiveteam collection you can fix it
20:08 🔗 SketchCow No.
20:08 🔗 SketchCow It's endemic throughout the system. Code changed.
20:08 🔗 godane oh
20:08 🔗 godane ok
20:08 🔗 SketchCow I'd been bringing it up elsewhere.
20:08 🔗 godane i only really noticed it in archiveteam collection
20:08 🔗 SketchCow The lead dev really wants us moving to the v2 version of archive.org, and she doesn't always care if v1 stuff is affected.
20:09 🔗 SketchCow Also, we have had a number of more aggressive "cleanup" routines running, which I've been contributing to.
20:09 🔗 SketchCow Sometimes they get a tad saucy.
20:09 🔗 godane fyi my bug in cbsnews.com videos collection happen to a item in archiveteam-fire
20:09 🔗 godane from 2011
20:09 🔗 godane and its not ximm fault this time
20:09 🔗 godane based on history
20:10 🔗 godane SketchCow: https://archive.org/details/forum.nos.org-2007
20:10 🔗 godane its has ubuntu arc.gz files in it
20:12 🔗 godane good news is they are released to the domain
20:12 🔗 godane forums.nos.org domain
20:13 🔗 godane at least that one is a web archive and some elses problem
20:14 🔗 godane but also shows there is bad code in archive.org upload script it looks like
20:14 🔗 godane like it will upload to any item with its domain in it
20:32 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
20:35 🔗 ruukasu has joined #archiveteam
20:35 🔗 amerrykan has quit IRC (Quit: Quitting)
21:06 🔗 amerrykan has joined #archiveteam
21:11 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:14 🔗 dashcloud has joined #archiveteam
21:17 🔗 Start has joined #archiveteam
21:17 🔗 bzc6p arkiver: one last question. Does it matter if there is "www." at the beginning of the URL, or Wayback Machine doesn't distinguish that and the one without www? Because if they count different, I should make two versions.
21:17 🔗 bzc6p I guess the answer is no, but I want to make sure.
21:18 🔗 schbirid www. and no www are different things and i am sure the WM differentiates
21:18 🔗 schbirid if one of them is canonical, use that
21:24 🔗 thechip_ has quit IRC (Ping timeout: 265 seconds)
21:26 🔗 DFJustin I think wayback has some smarts to roll them together
21:28 🔗 bzc6p Most of the is without www. But I think most of us omits www, although it may matter.
21:30 🔗 db48x` has joined #archiveteam
21:33 🔗 db48x has quit IRC (Ping timeout: 258 seconds)
21:34 🔗 SketchCow Problem has been fixed
21:38 🔗 aaaaaaaaa a quick test seems to indicate that it treats them the same, but not other subdomains
21:39 🔗 db48x` has quit IRC (Ping timeout: 258 seconds)
21:40 🔗 Start has quit IRC (Ping timeout: 265 seconds)
21:41 🔗 bzc6p So I've discovered ~1050 users, and have direct links to ~5600 files.
21:46 🔗 Start has joined #archiveteam
21:50 🔗 thechip has joined #archiveteam
21:53 🔗 Start has quit IRC (Ping timeout: 265 seconds)
21:53 🔗 Start has joined #archiveteam
21:55 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
21:56 🔗 ruukasu has joined #archiveteam
21:56 🔗 Start has quit IRC (Remote host closed the connection)
22:10 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
22:10 🔗 ruukasu has joined #archiveteam
22:16 🔗 bzc6p arkiver: list with filebox user main pages and with every discovered file, in all the three versions, is ready. Sites work with and without www. prefix without redirecting; but in my list they are without www., extend if you wish..
22:17 🔗 bzc6p http://paste.archivingyoursh.it/raw/xajobaqogo
22:29 🔗 bzc6p_ has joined #archiveteam
22:36 🔗 schbirid has quit IRC (Read error: Operation timed out)
22:37 🔗 bzc6p has quit IRC (Read error: Operation timed out)
22:40 🔗 APerti has joined #archiveteam
23:00 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
23:11 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:13 🔗 mhazinsk has joined #archiveteam
23:14 🔗 dashcloud has joined #archiveteam
23:17 🔗 aaaaaaaaa arkiver: I'm prepending http:// to the list bzc6p gave.
23:18 🔗 bzc6p_ aaaaaaaaa: I doubt that wpull needs that, but feel free
23:20 🔗 arkiver wayback makes no difference between www. and no www.
23:20 🔗 APerti Can you think of a floppy disk game that has a proprietary file system on it to prevent copying?
23:23 🔗 db48x has joined #archiveteam
23:24 🔗 Ravenloft has joined #archiveteam
23:27 🔗 db48x has quit IRC (Remote host closed the connection)
23:29 🔗 useretail has quit IRC (Read error: Operation timed out)
23:29 🔗 garyrh has quit IRC (Read error: Operation timed out)
23:29 🔗 will__ has quit IRC (Read error: Operation timed out)
23:29 🔗 Void_ has quit IRC (Read error: Operation timed out)
23:29 🔗 aaaaaaaaa APerti: didn't some Amiga programs do that?
23:31 🔗 will__ has joined #archiveteam
23:31 🔗 Void_ has joined #archiveteam
23:32 🔗 aaaaaaaaa arkiver bzc6p_: I put it here http://paste.archivingyoursh.it/raw/kaviligefe
23:32 🔗 aaaaaaaaa maybe it should be fed into archivebot with an a <
23:32 🔗 arkiver yep
23:34 🔗 garyrh has joined #archiveteam
23:37 🔗 APerti Love the site name!
23:38 🔗 APerti Awww...
23:38 🔗 APerti Domain: sh.it
23:38 🔗 APerti Status: UNASSIGNABLE
23:44 🔗 db48x has joined #archiveteam
23:45 🔗 APerti has quit IRC ()
23:53 🔗 mistym has quit IRC (Remote host closed the connection)

irclogger-viewer