#archiveteam 2016-08-23,Tue

↑back Search

Time Nickname Message
00:21 πŸ”— kristian_ has quit IRC (Leaving)
00:39 πŸ”— aschmitz Has anyone done any work on NPR's comments?
00:54 πŸ”— r3c0d3x Asked about this a few days back, didn't get any response, so I'd assume no.
01:47 πŸ”— HCross has quit IRC (Ping timeout: 246 seconds)
01:47 πŸ”— HCross has joined #archiveteam
01:54 πŸ”— khaoohs has joined #archiveteam
02:03 πŸ”— khaoohs has quit IRC (Quit: Leaving)
02:10 πŸ”— tomwsmf has quit IRC (Read error: Operation timed out)
02:30 πŸ”— mr-b has left
02:45 πŸ”— db48x has joined #archiveteam
03:10 πŸ”— db48x` has joined #archiveteam
03:11 πŸ”— db48x has quit IRC (Read error: Operation timed out)
03:14 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
03:15 πŸ”— BartoCH has joined #archiveteam
03:22 πŸ”— nicolas17 has quit IRC (Quit: U+1F634)
04:09 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
04:12 πŸ”— BartoCH has joined #archiveteam
04:17 πŸ”— JesseW has joined #archiveteam
04:17 πŸ”— Sk1d has quit IRC (Ping timeout: 250 seconds)
04:24 πŸ”— Sk1d has joined #archiveteam
04:26 πŸ”— JesseW we should probably get all the sites we can from http://www.users.totalise.co.uk as it appears to be a small ISP, in the process of being merged with another one (although they don't explicitly talk about shutting down the web sites)
04:29 πŸ”— JesseW !ig 28j6lpt5lmtyrdi4dhfugpmto squarespace\.com
04:35 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
04:35 πŸ”— HCross has quit IRC (Ping timeout: 246 seconds)
04:35 πŸ”— HCross has joined #archiveteam
04:43 πŸ”— DFJustin has quit IRC (Ping timeout: 260 seconds)
04:43 πŸ”— Meroje has quit IRC (Quit: bye!)
04:44 πŸ”— Meroje has joined #archiveteam
04:53 πŸ”— DFJustin has joined #archiveteam
04:53 πŸ”— swebb sets mode: +o DFJustin
05:05 πŸ”— DFJustin has quit IRC (Remote host closed the connection)
05:10 πŸ”— DFJustin has joined #archiveteam
05:15 πŸ”— HCross has quit IRC (Read error: Operation timed out)
05:15 πŸ”— HCross has joined #archiveteam
05:45 πŸ”— JesseW I'm in the process of grabbing the ones I can find with archivebot
05:51 πŸ”— quails has quit IRC (Ping timeout: 250 seconds)
05:56 πŸ”— quails has joined #archiveteam
05:57 πŸ”— phuzion has quit IRC (Read error: Operation timed out)
05:58 πŸ”— phuzion has joined #archiveteam
06:04 πŸ”— patrickod has quit IRC (Read error: Operation timed out)
06:04 πŸ”— patrickod has joined #archiveteam
06:05 πŸ”— phuzion has quit IRC (Read error: Operation timed out)
06:05 πŸ”— sep332 has quit IRC (Read error: Operation timed out)
06:05 πŸ”— midas1 has quit IRC (Read error: Operation timed out)
06:07 πŸ”— midas1 has joined #archiveteam
06:07 πŸ”— swebb sets mode: +o midas1
06:07 πŸ”— sep332 has joined #archiveteam
06:10 πŸ”— Fake-Name has quit IRC (Ping timeout: 501 seconds)
06:13 πŸ”— BlueMaxim has joined #archiveteam
06:13 πŸ”— phuzion has joined #archiveteam
06:13 πŸ”— Fake-Name has joined #archiveteam
06:49 πŸ”— zerbrnky has joined #archiveteam
06:49 πŸ”— zerbrnky hi all, anyone around?
06:49 πŸ”— tuankiet Any problem?
06:50 πŸ”— JesseW Zebranky: no. But ask whatever you were going to ask anyway...
06:50 πŸ”— zerbrnky hm i should use a different nick D:
06:51 πŸ”— zerbrnky i'm not Zebranky (i use a variant of this nick on places where longer names are allowed)
06:51 πŸ”— zerbrnky is now known as rbraun
06:51 πŸ”— JesseW oops, sorry
06:51 πŸ”— rbraun i was looking through the gawker dumps on archive.org and yeah, there might be a problem
06:52 πŸ”— JesseW well, a lot of our most recent stuff may not have made it up there yet
06:52 πŸ”— JesseW and we know about the robots.txt issues
06:52 πŸ”— JesseW is there a different problem?
06:52 πŸ”— rbraun it looks like they were grabbed by grabbing the sitemap for each month and then grabbing from there
06:52 πŸ”— rbraun the problem is that the sitemap for especially busy months can't be grabbed a whole month at a time
06:53 πŸ”— JesseW hm, yeah that could be an issue. godane?
06:53 πŸ”— rbraun so e.g. everything in January 2010 before 1/19 is missing from both this: https://archive.org/details/gawker.com-sitemap-2010-20160322
06:53 πŸ”— rbraun and from web.archive.org too
06:53 πŸ”— rbraun rather it's not all missing from web.archive.org but some pages are
06:54 πŸ”— rbraun and many of the pages that /are/ there weren't crawled this year, indicating the bulk grab in march didn't hit them
06:54 πŸ”— rbraun this seems to be a bigger problem for older pages (probably back when they still paid their writers by the article)
06:54 πŸ”— JesseW do you know of a way to get a list of the missing pages?
06:55 πŸ”— rbraun yeah, you just see what the start date for the sitemap was and edit the end date to be that, iterate until it grabs thru the first of the month
06:55 πŸ”— rbraun i'm working on it now but i was wondering if anyone had already done it
06:56 πŸ”— rbraun e.g. january 2010 takes 3 pulls
06:56 πŸ”— rbraun and then of course all the pages...
06:56 πŸ”— rbraun (january 2012 is complete, though)
06:56 πŸ”— JesseW godane is the person who has been working on it; hopefully he'll speak up
07:03 πŸ”— Honno has joined #archiveteam
07:11 πŸ”— JesseW has quit IRC (Ping timeout: 370 seconds)
07:11 πŸ”— rbraun is there a faster way to force wayback to crawl a list of URLs than just loading http://web.archive.org/save/[URL] for each?
07:13 πŸ”— PurpleSym Try #archivebot
07:18 πŸ”— rbraun oh, nice, there is a non-recursive option
07:20 πŸ”— rbraun archiveonly < FILE is probably what i need, thanks
07:21 πŸ”— rbraun when archivebot uploads a WARC to archive.org, does it end up in web.archive.org too?
07:21 πŸ”— rbraun in wayback, that is
07:23 πŸ”— PurpleSym Yes, that’s the point.
07:23 πŸ”— rbraun ok, thanks, this looks easier than i thought
07:24 πŸ”— rbraun (fwiw i first discovered this issue when i noticed something from jan 2010 *wasn't* in wayback at all; then, found it wasn't in the collection i linked either)
07:28 πŸ”— rbraun gut feeling is that 2007-2011 are affected in part
07:28 πŸ”— rbraun (looking at http://gawkerdata.kinja.com/closing-the-book-on-gawker-com-1785555716)
07:31 πŸ”— REiN^ has quit IRC (Read error: Connection reset by peer)
07:33 πŸ”— phuzion has quit IRC (Read error: Operation timed out)
07:36 πŸ”— phuzion has joined #archiveteam
07:52 πŸ”— schbirid has joined #archiveteam
08:14 πŸ”— godane based on site map its 2010-01-19 on: gawker.com/sitemap_bydate.xml?startTime=2010-01-01T00:00:00&endTime=2010-01-31T23:59:59
08:14 πŸ”— godane ok i see the problem: http://gawker.com/sitemap_bydate.xml?startTime=2010-01-01T00:00:00&endTime=2010-01-01T23:59:59
08:15 πŸ”— godane sometimes those sitemaps do some weird shit
08:16 πŸ”— rbraun godane: do you have the missing ones or should i keep compiling them and feed them to archivebot?
08:16 πŸ”— rbraun i have 2010 almost ready
08:17 πŸ”— godane you can feed them into archivebot if you want to
08:17 πŸ”— godane i will also see about doing it
08:17 πŸ”— rbraun i checked several URLs from my file; some of them are in wayback and some not
08:18 πŸ”— rbraun ok; i'm working on 2010 but i think all of 2007-11 might be affected based on volume
08:18 πŸ”— rbraun (and the URLs not in wayback weren't saved in the big March dump, they were crawled earlier)
08:19 πŸ”— godane i maybe doing a daily grabs now
08:19 πŸ”— godane regrabs of what i got
08:19 πŸ”— rbraun also really uncertain how long any of the site will stay up so
08:20 πŸ”— godane i will work on gawker.com sitemap
08:20 πŸ”— rbraun note that in every case i saw, if the monthly grab by default returned through X date, the original grab had all of those articles
08:21 πŸ”— rbraun but not all the ones before that
08:26 πŸ”— godane i'm redump grawker.com as daily sitemap grab
08:30 πŸ”— godane kataku.com has the same problem
08:30 πŸ”— godane *kotaku.com
08:37 πŸ”— REiN^ has joined #archiveteam
08:43 πŸ”— WinterFox has joined #archiveteam
08:58 πŸ”— rbraun godane: do you want what i have for 2010? might save some time
08:59 πŸ”— godane its not going to save me time sadly
09:00 πŸ”— godane my script make a run at the sitemap by the day now
09:00 πŸ”— rbraun ok
09:00 πŸ”— godane also i will have to do that with all of gawker sites
09:00 πŸ”— rbraun some of them i think don't have enough articles for this to have been an issue
09:01 πŸ”— rbraun not sure which ones though
09:01 πŸ”— godane i have uploaded some of thoses
09:01 πŸ”— godane they were in the 10 to 100mb
09:01 πŸ”— godane range
09:02 πŸ”— rbraun might save the crawler time at least to not have to recrawl what's known already in the archive?
09:02 πŸ”— BartoCH has joined #archiveteam
09:02 πŸ”— rbraun (several different ways to do that; i was just using the date cutoff)
09:05 πŸ”— rbraun also i'm not sure how much time is left for gawker.com specifically
09:08 πŸ”— godane btw the sitemap cut off is weird
09:09 πŸ”— godane like for 2008-11 i can get 3034 urls with gawker but only 1971 urls with kotaku.com
09:09 πŸ”— Medowar google code is empty. Can someone requeue
09:12 πŸ”— rbraun godane: there are fewer articles total for that month on kotaku though
09:13 πŸ”— rbraun godane: for 2008-11 if i request the whole month it cuts of at the 14th for gawker but the 6th for kotaku
09:13 πŸ”— rbraun oh, i see
09:13 πŸ”— rbraun yeah, why didn't it grab the whole month for kotaku...
09:13 πŸ”— rbraun fwiw their own sitemaps link in 1-week increments
09:14 πŸ”— rbraun http://gawker.com/sitemap.xml
09:14 πŸ”— rbraun not sure i trust that given how uneven it is but i haven't found a case where it failed yet
09:17 πŸ”— HCross has quit IRC (Ping timeout: 246 seconds)
09:17 πŸ”— HCross has joined #archiveteam
09:22 πŸ”— godane sitemaps for 2006-01 are start to be uploaded: https://archive.org/details/gawker.com-sitemap-2006-01-09-20160823
09:24 πŸ”— godane i'm doing 11 months of daily sitemaps at once :-D
09:24 πŸ”— rbraun that's going to produce a lot of collections... any reason not to combine those by month?
09:24 πŸ”— rbraun also FYI while investigating this, the sitemap_bydate.xml was giving me 500 errors sometimes
09:25 πŸ”— rbraun that was reliable if i didn't request whole-day increments
09:25 πŸ”— rbraun but it happened some other times too; just reloading fixed it
09:25 πŸ”— godane my script use curl to grab the sitemap by day then starts the download
09:26 πŸ”— rbraun why not cat those together like a month at a time?
09:27 πŸ”— godane cause i was not planing on doing that
09:27 πŸ”— rbraun well, the reason i ask is
09:27 πŸ”— rbraun the sitemaps provide an index of article titles
09:28 πŸ”— rbraun so if i know gawker published an article in 1/2010 but i don't know which day...
09:28 πŸ”— rbraun and i only know one word of the title or something
09:28 πŸ”— godane https://archive.org/details/archiveteam-fire?and[]=subject%3A%22www.dailymail.co.uk%22
09:28 πŸ”— godane i do it by date of sitemap
09:29 πŸ”— rbraun it's also easier to verify everything is in there if it's in larger chunks
09:29 πŸ”— vOYtEC has quit IRC (Ping timeout: 244 seconds)
09:29 πŸ”— godane i make a month sitemap may make me confuse
09:30 πŸ”— godane thinking it was done the old method when gawker sitemap doesn't get everything
09:30 πŸ”— godane so the daily dumps are meant to be different since the month and yearly failed
09:31 πŸ”— godane i can turn the daily dumps into monthly or yearly for that reason
09:32 πŸ”— rbraun hmm ok
09:34 πŸ”— godane i'm mostly trying to keep the raw sitemap urls the same set as date of urls
09:34 πŸ”— HCross2 has quit IRC (Quit: Connection closed for inactivity)
09:37 πŸ”— schbird has joined #archiveteam
09:37 πŸ”— schbird is there a way to record mouse/keyboard interaction with webrecorder.io or a similar tool?
09:37 πŸ”— rbraun godane: can your script handle the case where it returns a 500 error and retry?
09:38 πŸ”— schbird to actually replay all "user" interaction
09:38 πŸ”— rbraun godane: i guess curl --retry 10 or something
09:41 πŸ”— rbraun i was getting those intermittently even on sitemap_bydate requests that would later complete
09:49 πŸ”— godane i'm not really getting those errors
09:49 πŸ”— godane i get them on days that don't exist i think
09:49 πŸ”— rbraun no, i get empty files (or with the front page only) on days that don't exist
09:50 πŸ”— rbraun i get 500 errors when it's cranky or if i try to pull a partial day (which doesn't work)
09:50 πŸ”— rbraun but in the former case i had to retry a few times
09:50 πŸ”— rbraun if you pass --retry <#> to curl with some number of retries allowed, you should have no problem though
09:52 πŸ”— rbraun only getting that on the sitemaps occasionally, not the actual pages
09:52 πŸ”— Selavi has quit IRC (Ping timeout: 260 seconds)
09:53 πŸ”— Kksmkrn has joined #archiveteam
09:53 πŸ”— Kksmkrn has quit IRC (Connection closed)
09:53 πŸ”— Kksmkrn has joined #archiveteam
09:53 πŸ”— godane i'm going to bed now
09:53 πŸ”— godane i will continue tomorrow
09:54 πŸ”— rbraun ok good night
10:00 πŸ”— Selavi has joined #archiveteam
10:09 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
10:16 πŸ”— BartoCH has joined #archiveteam
10:28 πŸ”— enr1c0 has joined #archiveteam
10:31 πŸ”— Kksmkrn has quit IRC (Quit: leaving)
11:00 πŸ”— enr1c0 has quit IRC (Quit: ZNC 1.6.3+deb1 - http://znc.in)
11:00 πŸ”— enr1c0 has joined #archiveteam
11:01 πŸ”— enr1c0 has left
11:26 πŸ”— enr1c0 has joined #archiveteam
11:30 πŸ”— enr1c0 has quit IRC (Client Quit)
11:31 πŸ”— enr1c0 has joined #archiveteam
11:31 πŸ”— enr1c0 has left
11:35 πŸ”— HCross has quit IRC (Ping timeout: 246 seconds)
11:35 πŸ”— HCross has joined #archiveteam
12:28 πŸ”— irl has joined #archiveteam
12:29 πŸ”— irl ok, so i was here a while ago and i'm trying to archive a whole bunch of paper manuals and documents from the 70s-90s from obscure networking hardware and computer programs relating to networking and such
12:30 πŸ”— irl following a complete mess trying to use the university's MFD devices (they scan to email only, and couldn't do large attachments, so i was limited to ~5 pages)
12:30 πŸ”— irl i've now decided i want to buy a scanner with an ADF to sit in the lab
12:30 πŸ”— irl can anyone recommend such a scanner that can handle various paper types, and paper with binding holes etc. that isn't going to break constantly?
12:31 πŸ”— irl ideally it would have linux support and not be networked, but direct into the pc
12:31 πŸ”— irl ideally it would also be fast-ish, but i'll take reliability over speed
12:32 πŸ”— PurpleSym I recently *built* a 25€ DIY book scanner, but it’s quite slow.
12:32 πŸ”— irl i'm talking ~10,000 ish pages of manuals
12:32 πŸ”— irl they're mostly A4 paper that's been punched and hand-bound
12:32 πŸ”— PurpleSym So, destructive scanning then?
12:33 πŸ”— irl with those plastic binding things
12:33 πŸ”— PurpleSym I see.
12:33 πŸ”— irl my hope is to be able to just put the plastic things back on them afterwards
12:34 πŸ”— irl i've looked through ebay for scanners with adf, but i have no idea how reliable these things are
12:35 πŸ”— irl the HP 9200C 9200 Digital Sender seems to come up a lot and looks quite heavy duty
12:35 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
12:37 πŸ”— BartoCH has joined #archiveteam
12:46 πŸ”— irl purchased a 9200c, seems to have good reviews
12:47 πŸ”— irl i'm guessing a lot of these things will have valid copyright
12:47 πŸ”— irl any advice on how to work out what i can publish and what i shouldn't publish?
12:48 πŸ”— irl is there a place i can stash things until the copyright expires?
12:50 πŸ”— joepie91 irl: IA :)
12:50 πŸ”— joepie91 irl: IA will dark things if they get complaints
12:51 πŸ”— joepie91 where 'dark' === "it's still in the archives but not publicly accessible"
12:51 πŸ”— joepie91 (also you might want to talk to SketchCow regarding manuals)
12:51 πŸ”— atomotic has joined #archiveteam
13:03 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
13:04 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
13:04 πŸ”— BartoCH has joined #archiveteam
13:21 πŸ”— irl joepie91: ah cool (:
13:21 πŸ”— irl so i can basically automate most of this then using scanner->ftp->git-annex-assistant->ia
13:21 πŸ”— irl just need to get the right metadata in the right places
13:21 πŸ”— irl SketchCow: i might want to talk to you
13:24 πŸ”— WinterFox has quit IRC (Read error: Operation timed out)
13:27 πŸ”— beardicus has quit IRC (bye)
13:27 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
13:28 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
13:31 πŸ”— beardicus has joined #archiveteam
13:31 πŸ”— swebb sets mode: +o beardicus
13:35 πŸ”— beardicus has quit IRC (Client Quit)
13:37 πŸ”— beardicus has joined #archiveteam
13:37 πŸ”— swebb sets mode: +o beardicus
13:45 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
13:45 πŸ”— BartoCH has joined #archiveteam
13:46 πŸ”— wp494 has quit IRC (Read error: Operation timed out)
13:47 πŸ”— dashcloud has joined #archiveteam
14:42 πŸ”— tomwsmf has joined #archiveteam
14:47 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
14:52 πŸ”— irl_ has joined #archiveteam
14:53 πŸ”— irl has quit IRC (Quit: WeeChat 1.5)
14:53 πŸ”— irl_ is now known as irl
14:56 πŸ”— irl has quit IRC (Client Quit)
14:57 πŸ”— irl has joined #archiveteam
15:00 πŸ”— irl SketchCow: if you're interested in old manuals, i can get you a list of the things we maybe have
15:01 πŸ”— nicolas17 has joined #archiveteam
15:01 πŸ”— irl SketchCow: i'm now idling here via znc, so i'll see when you respond as i guess you're not around right now
15:02 πŸ”— irl i'll be at debian uk bbq eating burgers this weekend, but will probably start a go at this the following weekend
15:02 πŸ”— irl (slow start - not diving in)
15:15 πŸ”— wp494 has joined #archiveteam
15:18 πŸ”— JesseW has joined #archiveteam
15:23 πŸ”— schbird has quit IRC (Read error: Operation timed out)
15:25 πŸ”— JesseW has quit IRC (Read error: Operation timed out)
15:34 πŸ”— BartoCH has joined #archiveteam
15:56 πŸ”— VADemon has joined #archiveteam
16:13 πŸ”— SketchCow Hugs to irl
16:35 πŸ”— irl hello
16:35 πŸ”— irl SketchCow:
16:35 πŸ”— irl still here?
16:42 πŸ”— HCross2 has joined #archiveteam
16:42 πŸ”— SketchCow Yep.
16:43 πŸ”— SketchCow So much talking. Come to #archiveteam-bs
17:00 πŸ”— tomaspark has quit IRC (Ping timeout: 255 seconds)
17:02 πŸ”— db48x` is now known as db48x
17:06 πŸ”— arkiver bayimg is online again
17:06 πŸ”— arkiver I restarted the script
17:06 πŸ”— arkiver it's not yet in the warrior
17:06 πŸ”— arkiver http://tracker.archiveteam.org/bayimg/
17:06 πŸ”— arkiver * restarted the projects
17:06 πŸ”— arkiver project*
17:18 πŸ”— SketchCow OK SO FINALLY
17:19 πŸ”— SketchCow http://fos.textfiles.com/pipeline.html is in version 1.0. It'll run once a day (with an indication of when it was run). It's NOT real-time, it's just a way for your nerds to notice what's going on on the site, and be able to communicate with me or each other on a status.
17:20 πŸ”— SketchCow It's Inbox --> Outbox --> IA, and if there's interruptions at IA, the Outbox might fill and "work" but will leave some items untouched.
17:20 πŸ”— arkiver some projects seem to be missing
17:21 πŸ”— SketchCow It's generating right now, but Orkut is such a nightmare, it will sit there for a while. I added another black-label "line" at the bottom of the table so you can see the difference between "running" and done. Looks like 10-15 minutes of disk thrashing to get through the mess.
17:21 πŸ”— arkiver I see
17:21 πŸ”— SketchCow In the future, when it has the second black line at the bottom, if it's not there, it's not in the pipeline.
17:22 πŸ”— SketchCow The script in the future will probably run in 5 minutes, as long as insanities like orkut aren't going on.
17:23 πŸ”— SketchCow So for example, the WHOLE pipeline is backed up (google code is at 187g) because of Orkut
17:23 πŸ”— arkiver Yep
17:23 πŸ”— SketchCow But at least now, in the future, one of you can go "Hey, looks like boombox project is at 300gb for some reason" and we can jump on that.
17:24 πŸ”— SketchCow Or "it's time to add an upload script to this or that project"
17:24 πŸ”— arkiver orkut is going down in 8 days, so just a little more time
17:26 πŸ”— DigDug i thought orkut was long gone
17:26 πŸ”— arkiver still here as an archive https://orkut.google.com/en.html
17:27 πŸ”— Kaz are we on track to finish orkut? I have more available if FOS can handle it, if needed.
17:27 πŸ”— arkiver I think we're going to make it
17:27 πŸ”— arkiver Tomorrow or the day after we're going to retry the larger communities, so you might have to do a little less concurrent
17:28 πŸ”— Kaz nod
17:28 πŸ”— arkiver But I'll want you before we do that
17:28 πŸ”— arkiver the larger communities can be millions of posts
17:28 πŸ”— arkiver (and URLs)
17:29 πŸ”— SketchCow So, the script is going to finish running, and I'm going to make two improvements.
17:29 πŸ”— SketchCow First, it will not copy over the finished .html file until it's 100% done, so in the future, it's just "there" and not "in progress"
17:30 πŸ”— SketchCow Second, I'm going to make a "cheat sheat" which will occasionally be forgotten by me to update but will change the "Project" name into something better.
17:30 πŸ”— nicolas17 I tried archiving orkut and it seemed like you didn't need more nodes
17:30 πŸ”— nicolas17 since most of the time I got rate-limiting by the tracker anyway
17:31 πŸ”— nicolas17 so the download rate was limited by that setting, not by how many people were running the warrior
17:42 πŸ”— AlexLehm has joined #archiveteam
17:54 πŸ”— SketchCow http://fos.textfiles.com/pipeline.html just finished.
17:55 πŸ”— SketchCow NOW you can rain down questions
18:00 πŸ”— schbird has joined #archiveteam
18:25 πŸ”— pfallenop has quit IRC (Ping timeout: 260 seconds)
18:25 πŸ”— pfallenop has joined #archiveteam
18:30 πŸ”— schbird has quit IRC (Read error: Operation timed out)
18:34 πŸ”— Zialus has quit IRC (Read error: Operation timed out)
18:34 πŸ”— HCross2 arkiver: let me know when, and I'll reduce my quarter of a trillion concurrent
18:38 πŸ”— Zialus has joined #archiveteam
18:40 πŸ”— arkiver SketchCow: it would be nice if it also shows megaWARC size
19:19 πŸ”— VerifiedJ has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client)
19:24 πŸ”— SketchCow Not really easy to do that, since stuff will be either out or stuck.
19:24 πŸ”— SketchCow Oh wait.
19:24 πŸ”— SketchCow Mmm, let me see
19:28 πŸ”— SketchCow I got it working. It's re-running and it'll update with it after it's done.
19:29 πŸ”— SketchCow (almost all are 40gb but it's trivial to print it)
19:29 πŸ”— SketchCow if someone wants to be a hero and wiki all this, go ahead
19:52 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
19:58 πŸ”— BartoCH has joined #archiveteam
20:39 πŸ”— ats one that needs crawling for the magazines collection: http://www.muzines.co.uk
20:39 πŸ”— ats sadly it has a stupid obnoxious Javascript-based interface...
20:47 πŸ”— schbirid seems to work mostly fine without js here
21:04 πŸ”— HCross2 has quit IRC (Quit: Connection closed for inactivity)
21:13 πŸ”— Morbus has joined #archiveteam
21:15 πŸ”— VerifiedJ has joined #archiveteam
21:19 πŸ”— schbird has joined #archiveteam
21:27 πŸ”— VerifiedJ GTAGaming.com's database was compromised and they may be think about shutting the website down along with www.gta4-mods.com. http://www.gtagaming.com/news/comments.php?i=2369
21:40 πŸ”— Honno has quit IRC (Read error: Operation timed out)
21:50 πŸ”— VerifiedJ has left
21:58 πŸ”— vOYtEC has joined #archiveteam
21:59 πŸ”— schbird has quit IRC (Leaving)
22:07 πŸ”— schbirid2 has joined #archiveteam
22:10 πŸ”— schbirid has quit IRC (Read error: Operation timed out)
22:33 πŸ”— RichardG has joined #archiveteam
22:42 πŸ”— schbirid2 has quit IRC (Read error: Operation timed out)
22:45 πŸ”— schbirid2 has joined #archiveteam
22:47 πŸ”— AlexLehm has quit IRC (Ping timeout: 260 seconds)
23:16 πŸ”— JW_work1 has joined #archiveteam
23:18 πŸ”— JW_work has quit IRC (Read error: Operation timed out)
23:23 πŸ”— RichardG has quit IRC (Read error: Operation timed out)
23:28 πŸ”— SketchCow Who here can read an ext3 disk and is comfortable with possibly having to do a dd and then extracting of data
23:28 πŸ”— SketchCow US preferred
23:29 πŸ”— nicolas17 you mean a physical disk, or?
23:36 πŸ”— SketchCow Physical, here in front of me.
23:37 πŸ”— * nicolas17 is physically too far
23:37 πŸ”— Frogging what's involved in it? i.e. why can't you do it?
23:37 πŸ”— rchrch has joined #archiveteam
23:37 πŸ”— SketchCow Don't want to
23:37 πŸ”— Frogging ah
23:37 πŸ”— SketchCow If you're asking what's involved, you're not for the job
23:38 πŸ”— nicolas17 well, he's asking eg. is it a corrupted ext3 you have to recover things out of, or just a clean filesystem but you have no Linux? :P
23:38 πŸ”— Frogging yeah, basically^
23:39 πŸ”— Frogging I can do magic with block devices but I'm not so good at fixing physically broken disks
23:39 πŸ”— SketchCow Not broken
23:40 πŸ”— Frogging ah. can you ship?
23:43 πŸ”— Frogging I assume so because you said US preferred. I'm in Canada though. But if nobody closer wants to then I volunteer
23:44 πŸ”— Frogging I enjoy this sort of thing
23:45 πŸ”— kristian_ has joined #archiveteam
23:48 πŸ”— RichardG has joined #archiveteam
23:56 πŸ”— Stiletto has quit IRC (Ping timeout: 246 seconds)
23:57 πŸ”— SketchCow You're in line
23:57 πŸ”— SketchCow We'll see if anyone else in the US wants it.
23:58 πŸ”— SketchCow I can sustain a canadian mailing

irclogger-viewer