#archiveteam-bs 2012-07-06,Fri

↑back Search

Time Nickname Message
03:31 🔗 joepie91 does anyone else here have that annoying feeling that you're the only person that can 'just get shit done' without cutting corners and without unnecessarily overcomplicating shit?
05:08 🔗 S[h]O[r]T i have that feeling but not in that context at all
05:08 🔗 S[h]O[r]T sometimes i feel like im the only one who can get shit done without everyone else thinking it cant be done or we dont have the resources or they are too retarded to bother
05:09 🔗 shaqfu I get that feeling about my entire profession
05:12 🔗 S[h]O[r]T shaqfu probably thought that about me the other day trying to manipulate a text file into excel and doing all kinds of crazy find and replaces. and he does like 2 things in bash script :P
05:14 🔗 shaqfu Nah :P
05:14 🔗 shaqfu 90% of that was schbirid anyway
05:15 🔗 shaqfu But 99% of moderately difficult problems in the archives world is "pay someone else" or "too hard, don't do it"
05:19 🔗 Coderjoe like the LOC MARC21 catalog situation... they pay someone else to handle it, and if you want access, you also have to pay up.
05:19 🔗 chronomex WORKS 4 ME
05:19 🔗 shaqfu Report went up for web archiving; 2/3 of people that do it, use Archive-It
05:19 🔗 shaqfu 2% use wget :(
05:19 🔗 Coderjoe command lines are hard. let's go shopping!
05:20 🔗 chronomex shopping for bookshelves or some shit
05:20 🔗 shaqfu Pay IA to use the same goddamn tools we can use free!
05:20 🔗 shaqfu It's not like they publically release their spiders or something wacky like that
05:21 🔗 Coderjoe hey, at least IA get some income from it. It unfortunately costs a lot to keep that amount of storage online with that much bandwidth
05:21 🔗 shaqfu Someone's gotta pay for underscor's liquor
05:22 🔗 shaqfu chronomex: Those are usually bought second-hand
05:23 🔗 shaqfu Although I honestly don't have a clue who gets rid of mobile shelving; it's not like it goes obsolete
05:24 🔗 shaqfu The only library I know of that can conspiculously consume is Harvard; maybe they supply everyone else... :P
05:34 🔗 Coderjoe I'd like to have some of those movable shelves that run on rails
05:35 🔗 Coderjoe you essentially have one isle that you move around to where you need it
05:37 🔗 underscor With Archive.it you're paying to have a GUI that anyone can use
05:37 🔗 underscor and also storage
05:37 🔗 underscor and support
05:37 🔗 underscor It's sorta like you can run centos or RHEL w/ support contract
05:37 🔗 underscor Also, heretrix IS open source
05:37 🔗 underscor The same exact version they use.
05:38 🔗 shaqfu Coderjoe: Pity it's insanely expensive just for the hardware, let alone installing it
05:38 🔗 Coderjoe indeed
05:38 🔗 underscor Coderjoe: Yeah, those things are so cool
05:38 🔗 shaqfu Odds are good you need a reinforced floor
05:38 🔗 underscor Museum of country music or whatever it's called in Nashville had them in their archives
05:38 🔗 underscor fun to watch
05:38 🔗 Coderjoe you mean concrete isn't good enough?
05:39 🔗 shaqfu Coderjoe: It probably is
05:39 🔗 shaqfu I always felt like I was going into a secret vault when using them
05:39 🔗 shaqfu Turning some big crank and opening a wlal
05:39 🔗 underscor * kmm has changed topic for #303 to: "i just met you, and this is crazy, but here's my boner, tug it maybe"
05:40 🔗 underscor hahahahaha
05:40 🔗 Coderjoe i certainly wouldn't try putting them on a standard 2x6 joist + 2 plywood layer home floor
05:40 🔗 shaqfu underscor: The storage/support is probably worth it, but a lot of these places are big unis that should have the ability/hardware on hand
05:40 🔗 underscor shaqfu: It's probably cheaper over all though
05:40 🔗 Coderjoe shaqfu: why reinvent the wheel? archive.it has everything set up nice and easy
05:40 🔗 underscor ^
05:41 🔗 Coderjoe rather than have some staffer figure things out and then try and pass that on while handling the inevitiable support queries
05:41 🔗 shaqfu Coderjoe: If you have your own repository for stuff, and you have someone that can run Heritrix, why not have them do it vs. paying someone else
05:41 🔗 Coderjoe cost
05:41 🔗 underscor shaqfu: because archive-it does a lot more than just run heretrix
05:41 🔗 Coderjoe the employee's time likely costs more than what archive-it charges
05:42 🔗 Coderjoe and the employee(s) can then be doing other stuff
05:42 🔗 shaqfu Coderjoe: What's the charge? I figured it was low, but not lower than a few hours per week of someone's time
05:42 🔗 underscor I should see if I can get a demo account to show you guys
05:42 🔗 Coderjoe I have no idea
05:42 🔗 underscor shaqfu: Heretrix gives you a pile of warcs. Then what?
05:42 🔗 underscor How do you make that accessible to your constituents?
05:42 🔗 underscor What about focused crawls where only one department should have access?
05:43 🔗 shaqfu underscor: Devise a sensible versioning system, store it long-term, copy over the copied data and toss it on a server somewhere
05:43 🔗 underscor What if you want z department only to crawl a whitelist of domains, x department to be able to crawl anything, and y to have all requests approved by x
05:44 🔗 underscor shaqfu: I can see I'm not going to win here
05:44 🔗 shaqfu underscor: I'm listening, really
05:44 🔗 underscor Why do people pay for google apps then?
05:44 🔗 underscor When you can just run your own mail server
05:45 🔗 underscor or EC2, when you can run your own hardware
05:45 🔗 shaqfu EC2 is for huge-scale work, and a mailserver is much harder than pulling down sites
05:45 🔗 underscor archive-it is not only about pulling down sites.
05:45 🔗 underscor it's about storage, retreival, and display
05:45 🔗 Coderjoe cost again. it costs money to have someone manage the server(s)
05:45 🔗 underscor We even have partners that upload their own WARCs to archive it
05:46 🔗 underscor Just for the storage/display/user controls/permissions stuff
05:46 🔗 underscor It's a lot of work to set up an environment that can just spin up an arbitrary set of WARCs
05:46 🔗 underscor from a nice gui
05:46 🔗 shaqfu The storage I get if you're either too small to afford good infrastructure or don't have it ready, but places that do, I don't see the incentive
05:47 🔗 shaqfu Although yeah, display and retrieval are awesome
05:48 🔗 shaqfu Admittedly, I could just be bitter and lumping Archive-It with any number of unnecessary paid services places use, which is insanely unfair
05:48 🔗 underscor You should try it :)
05:48 🔗 underscor I'll talk to the people tomorrow
05:48 🔗 underscor see if I can get a demo session token set up for you guys to fiddle with for a week
05:49 🔗 shaqfu But if you're a big uni and paying for web + email + av + research, at some point you have to just say "this is enough"
05:49 🔗 underscor I think they got to the point from the other side
05:49 🔗 underscor They were running their own web + email +av +research, and it was too much overhead
05:50 🔗 shaqfu underscor: Could be; it just seems cheaper to centralize and hire one or two people vs. outsourcing many services
05:50 🔗 underscor I suppose it's not, though, because they would if it was :P
05:50 🔗 shaqfu And I don't think every outsourced solution is as kind as IA :)
05:50 🔗 underscor Oh, definitely.
05:50 🔗 underscor Most uni's don't pay for email, though
05:50 🔗 shaqfu underscor: I had a uni archivist tell me once that email was "just too hard"
05:50 🔗 underscor gapps for universities is free
05:51 🔗 Coderjoe I take it you've been quite insulated from financial accounting at your places of employ
05:51 🔗 underscor Also, having full text search WITHIN archived content is pretty nice
05:51 🔗 shaqfu So I think a lot of places are just letting it go fallow, and haven't gotten to the point of dealing with everything at once
05:51 🔗 underscor and is definitely something that plain heretrix won't give you
05:51 🔗 shaqfu Coderjoe: Yes, I have (outside of the usual "we're poor again")
05:52 🔗 shaqfu underscor: couldn't you just grep across everything not in tags?
05:52 🔗 underscor Grep across 4,281,459,653 warcs?
05:53 🔗 shaqfu Point taken
05:53 🔗 chronomex archiveteam: purveyor of Reasonably Big Data
05:54 🔗 underscor :D
05:54 🔗 Coderjoe to run their own service, they would have to pay for: development time, hardware costs, running-hardware costs, maitainer time, support time...
05:54 🔗 chronomex pager duty time
05:54 🔗 underscor or they could pay $x for all you can eat
05:54 🔗 underscor It's actually not like that, it's mildly tiered
05:54 🔗 underscor But it's still very cheap, comparitively
05:55 🔗 shaqfu Coderjoe: Again, these are places that would start seeing economy-of-scale benefits by taking on the task themselves
05:55 🔗 Coderjoe and all that people time includes not just wage/salary, but also taxes, FICA, possibly health insurance
05:55 🔗 underscor shaqfu: but not compared to IA's scale
05:55 🔗 shaqfu underscor: Oh, absolutely not
05:55 🔗 Coderjoe and not get any actual income on it to cover those costs
05:56 🔗 underscor IA's margins on archive-it are pretty thin, so I think that even with the added margins, a-i is still cheaper than any uni would ever need
05:56 🔗 underscor scalewise
05:56 🔗 underscor One thing that really bugs me is the inconsistent theming
05:56 🔗 underscor http://www.archive-it.org/organizations/369 vs http://wayback.archive-it.org/2344/*/http://action.aclu.org/
05:56 🔗 shaqfu underscor: For that one task, yes. I'm thinking of what happens when more and more gets added, if it's still reasonable to pay for it
05:57 🔗 Coderjoe it is a discussion I have had a few times with my boss (very small company) : any time spent on IT stuff is not being paid for by an outside source. however, that time needs to be taken in order for the paying work to get done.
05:58 🔗 Coderjoe and if the turnkey solution is less expensive than the expected roll-your-own costs, almost any accountant will push the turnkey
05:58 🔗 Coderjoe (depending on if it does what is needed, doesn't have security/privacy problems, etc)
05:58 🔗 shaqfu Hm, I'm curious if there are places that handle NSF research data for a fee, given that every uni in the US has to deal with it
06:00 🔗 shaqfu I honestly hope I'm wrong, and that archives outsourcing services/storage is a legitimate solution, and not a symptom of "too hard" thinking
06:01 🔗 winr4r wow, "scalewise" highlights the window for containing my name
06:02 🔗 underscor How does scalwise contain winr4r?
06:02 🔗 shaqfu But I'm concerned when large research unis, that have the infrastructure in place to add more long-term storage, do it
06:02 🔗 shaqfu If they're doing it because that's honestly and truly the best solution, or if it's due to short-term thinking
06:03 🔗 Coderjoe and archives generally have less funding to spend on development costs than corporations do
06:03 🔗 winr4r underscor: lewis
06:03 🔗 winr4r good morning, folks
06:03 🔗 shaqfu Coderjoe: Where I was, anything involving computers was handled by a different department, which got whatever it wanted
06:04 🔗 Coderjoe and I thought we were talking about more than just storage, but all of the software that goes into archive-it
06:04 🔗 underscor winr4r: aha. my client only highlights on nick
06:04 🔗 shaqfu Coderjoe: Certainly, but again, there's a tipping point
06:06 🔗 shaqfu If you're just doing web stuff, then it makes sense to use A-It. But if you do that for *everything* involving computers, that's a problem
06:10 🔗 shaqfu Dunno; maybe it'll shake out once places stop using /dev/null to archive email...
06:23 🔗 chronomex underscor: the latter link http://wayback.archive-it.org/2344/*/http://action.aclu.org/ totally matches the styling of every library website ever
06:23 🔗 underscor lol
06:23 🔗 chronomex compare: http://www.spl.org/
06:26 🔗 underscor roundrects!
08:59 🔗 omf_ library do seem uninspired
08:59 🔗 omf_ sites
08:59 🔗 omf_ I just checked my local one and it looks the same
08:59 🔗 omf_ it is kinda like wikis in that way
09:00 🔗 omf_ almost all wiki sites look the same
09:00 🔗 omf_ and frankly I think that is boring
09:06 🔗 Coderjoe why can't websites be simple? why must they be a vomit of flash and javascript and busy colors?
09:24 🔗 chronomex nobody gives money to blue rectangles
09:25 🔗 chronomex like seriously, look at these guys: http://www.berkshirehathaway.com/
09:27 🔗 Coderjoe indeed. nobody gives money to them!
10:02 🔗 omf_ in the corporate environment minimal is always best
10:02 🔗 omf_ did any of you see that netflix aws panel that got open sourced
10:02 🔗 omf_ way better design than the amazon interface
10:07 🔗 omf_ eye bleed warning: http://emporiumchicago.com/
11:20 🔗 godane uploading episode 127 of dl.tv
11:20 🔗 godane starting the next batch of uploads
11:40 🔗 godane uploaded: http://archive.org/details/dltv_127_episode
11:53 🔗 godane uploaded: http://archive.org/details/dltv_128_episode
12:16 🔗 godane uploaded: http://archive.org/details/dltv_129_episode
12:34 🔗 godane uploaded: http://archive.org/details/dltv_130_episode
12:56 🔗 godane uploaded: http://archive.org/details/dltv_131_episode
12:56 🔗 omf_ has anyone tried scraping google with phantomJS
12:56 🔗 omf_ it or selenium gets us over the "real" browser hurdle
13:26 🔗 godane http://archive.org/details/dltv_132_episode
13:26 🔗 godane full dvd 2 of dl.tv is uploaded
13:26 🔗 godane :-D
13:35 🔗 SketchCow Wow, I REALLY need to fix the spam issue on the wiki.
13:38 🔗 S[h]O[r]T it bypasses the re-captcha eh?
13:38 🔗 S[h]O[r]T or completes it
13:42 🔗 godane uploaded: http://archive.org/details/dltv_133_episode
14:00 🔗 godane uploaded: http://archive.org/details/dltv_134_episode
14:16 🔗 godane uploaded: http://archive.org/details/dltv_135_episode
14:51 🔗 godane uploaded: http://archive.org/details/dltv_137_episode
14:51 🔗 godane uploaded: http://archive.org/details/dltv_136_episode
14:52 🔗 ersi Busy man
14:52 🔗 Schbirid interesting, the googlebot seems to download my bigger forumplanet warcs partially. ~16mb each
14:57 🔗 godane ersi: I still have another 25gb of dl.tv
14:57 🔗 godane ersi: all of crankygeeks is up there
15:13 🔗 godane uploaded: http://archive.org/details/dltv_138_episode
15:16 🔗 godane i think the format changed with episode 139
15:17 🔗 godane its big res and smaller file
15:17 🔗 godane i only know this cause the last episode was 50:18 and 228.2mb in size
15:18 🔗 godane episode 139 is 50:14 but 203.2mb in size
15:18 🔗 godane also looks more wide-screen
15:45 🔗 godane uploaded: http://archive.org/details/dltv_139_episode
15:50 🔗 godane uploaded: http://archive.org/details/dltv_140_episode
15:52 🔗 SketchCow No need to keep updating us
15:53 🔗 SketchCow If I did that, this channel would be unusuable
15:53 🔗 godane sorry
15:54 🔗 yipdw #godane
15:55 🔗 yipdw it doubles as a hashtag
15:55 🔗 SketchCow #godane-bs
15:59 🔗 ersi :D
16:00 🔗 godane i may becoming like SketchCow with techtv videos
16:00 🔗 godane i wish you had a interview on techtv
16:09 🔗 BlueMax #sketchcow-bs would be a wasteland, since nothing he says is bs
16:10 🔗 BlueMax except "I will never take a picture of myself in a tutu"
16:19 🔗 yipdw FANFICTION/2/24/240/u/2405042/Must_Have_Yaoi/2405042.cooked.warc.gz
16:19 🔗 yipdw I wonder if that's still aroudn
16:21 🔗 yipdw heh, I forgot how slow extracting one file out of a 50 GB tar is
16:28 🔗 godane how can 87% of splinder been delete?
16:29 🔗 godane there most have been a lot of small profiles
16:29 🔗 godane with next to zero pics
17:11 🔗 godane did you guys get this: http://www.theregister.co.uk/2007/02/26/microsoft_archive_goes_torrent/
17:38 🔗 yipdw I didn't
17:38 🔗 yipdw a five-year gap is kind of large
17:40 🔗 Coderjoe long tail torrents seldom work out
18:02 🔗 balrog yipdw: it's on groklaw.
18:02 🔗 balrog http://www.groklaw.net/staticpages/index.php?page=2007021720190018
18:02 🔗 balrog they're also working on transcribing it
18:25 🔗 Schbirid does anyone know a directory tree map tool (ilke baobab, filelight or seqiuoaview) that accepts a textfile with one line per file location as input and maps that?
18:26 🔗 chronomex who is this hatman chump, anyway
18:26 🔗 chronomex he messaged me for some reason
18:39 🔗 Schbirid baobab, filelight, gdmap do not do it. jdiskreport supports saving and opening scan data but uses a binary format
18:39 🔗 Schbirid kdirstat looks promising, you can save a plaintext cache file
18:40 🔗 Schbirid yeah, awesome
18:41 🔗 Schbirid you can edit it just fine. checksums inside are not checked if you just open the cache file
18:46 🔗 Schbirid hm, this wont be easy
18:47 🔗 Coderjoe directory tree map tool?
18:47 🔗 Schbirid kdirstat wants one line specifing a directory and then the files as lines below
18:47 🔗 Schbirid yeah
18:47 🔗 Schbirid http://media.cdn.ubuntu-de.org/wiki/attachments/50/28/gdmap.png
18:47 🔗 Schbirid http://media.cdn.ubuntu-de.org/wiki/thumbnails/6/64/6431f4bdde74de4697fc08067034edf7617bbf08i250x.png
18:48 🔗 Coderjoe oh
18:49 🔗 Coderjoe another nifty tool in that arena (though without the graphical part) is ncdu, if at the console
18:49 🔗 Schbirid yeah i use that a lot
18:49 🔗 Coderjoe but it doesn't do cache files
18:49 🔗 Schbirid but the data i have are wget logs, not a filesystem
18:49 🔗 Schbirid would like to get an overview on what we have on fileplanet so far
19:27 🔗 Schbirid i am diving into sed-hell
19:28 🔗 omf_ what needs fixing?
19:48 🔗 Schbirid oh boy it works
19:48 🔗 Schbirid slowly
19:48 🔗 Schbirid terribly
19:48 🔗 Schbirid just like i do it all the time
19:49 🔗 Schbirid sed inplace ftw
20:19 🔗 winr4r SketchCow: i've just learned: when i buy your next documentaries please send them to me with the cheapest service you can
20:34 🔗 underscor I reat that as sex hell
20:34 🔗 underscor read*
20:34 🔗 underscor and got all excited, and then was disappoint when I reread
20:34 🔗 underscor :(
20:48 🔗 Schbirid poop, this is not trivial and i give up
20:50 🔗 Schbirid i have a list of paths to files
20:51 🔗 Schbirid i need to rework it so before each file that is in a new (sub-)directory there is a line with that directory
21:00 🔗 Schbirid if someone writes that (gnu tools) we could get something like http://i.imgur.com/OT7Et.png from wget -nv logs
21:00 🔗 Schbirid example data https://pastee.org/hyumg (the numbers at the end of lines are sizes, must be kept intact)
21:03 🔗 Schbirid example result https://pastee.org/tf98s
21:04 🔗 Schbirid good night
22:35 🔗 winr4r SketchCow: (delayed) because in the UK apparently we pay tax in the amount that it cost to send it, not just on the value of the parcel
22:36 🔗 Coderjoe mmm
22:36 🔗 Coderjoe http://www.wzzm13.com/news/article/217608/14/Drivers-asked-to-detour-around-I-196-buckling
22:36 🔗 Coderjoe when I went through, it was only a 2" to 3" rise
22:36 🔗 Coderjoe now it's supposedly 8" to 10"
22:36 🔗 Coderjoe <3 the heat
22:40 🔗 winr4r i think it has less to do with the heat than it does with governments having less money to deal with the damage from heavy trucks
22:42 🔗 Coderjoe the rise was not there yesterday. at all.
22:43 🔗 winr4r which isn't contradictory to what i said
22:43 🔗 Coderjoe and looking at the picture they just updated with, I think the measurement given may be the length of the buckled section rather than the height.
22:45 🔗 winr4r roads deteriorate very quickly if they're used by heavy trucks and aren't maintained meticulously
22:45 🔗 winr4r it could have changed in that time
22:47 🔗 Coderjoe then why do these large buckling events only seem to happen in very hot weather?
22:55 🔗 Coderjoe and by heat today, we're talking 106F today, 102F yesterday, and high 90s on the 4th
23:04 🔗 SmileyG o_O
23:04 🔗 SmileyG Here in the UK our main problem is the tarmac cracks due to heavy trucks/buses
23:04 🔗 SmileyG that inself isn't much of a problem, but then the water gets in, freezes and that destories everything.
23:05 🔗 SmileyG While we've never had your kind of heat, I'm wondering if air is some how getting trapped under there and then heating up...
23:17 🔗 winr4r well, yes
23:18 🔗 winr4r Coderjoe: that's like saying "why does my pasta only grow larger when the water is hot"
23:19 🔗 winr4r it's an odd question
23:20 🔗 winr4r roads get damaged more in temperature extremes, when they're not very well maintained
23:25 🔗 Coderjoe http://i.imgur.com/8BKpN.jpg
23:26 🔗 winr4r Coderjoe: haha, that is awesome
23:26 🔗 winr4r i love that
23:26 🔗 Coderjoe you then admit it has more to do with the temperature extreme than ill maintainance?
23:27 🔗 winr4r Coderjoe: i'm saying that roads under loads of heavy trucks will deteriorate very quickly if they're not maintained meticulously, that's all
23:27 🔗 Coderjoe the road in question would have lasted until the next planned maintainance there had the temperatures not gotten so high today. (and I am sure the interior temperature of the blacktop was higher than the air temp of 107F)
23:28 🔗 winr4r smaller vehicles don't exert enough axle damage to even merit repairing them
23:29 🔗 winr4r according to a study that was done in the US, a 40 ton truck does 10,000 times as much axle damage as a 2-ton car
23:29 🔗 winr4r as i recall
23:30 🔗 Coderjoe what does damage to axles have to do with it?
23:30 🔗 winr4r Coderjoe: as in damage to the roads transmitted by the axles
23:34 🔗 winr4r so roads deteriorate very rapidly unless they are subsidised to well in excess of what trucks actually pay
23:35 🔗 winr4r but wat do i no lol
23:36 🔗 winr4r i am going to watch stupid videos then go to bed

irclogger-viewer