#archiveteam-bs 2013-08-20,Tue

↑back Search

Time Nickname Message
00:47 πŸ”— dashcloud what happened to torrentbytes?
00:49 πŸ”— dashcloud after a recent experience with VBA at work, I think I know why there was so much VB crap, and why it was so popular now
00:59 πŸ”— dashcloud it's surprisingly easy to slap together blocks of example/sample code and have them work together, even if you know little to nothing about the language. This leads to horribly written monstrosities that become key pieces of someone's workflow.
01:02 πŸ”— godane torrentbytes went offline for server upgrades
01:02 πŸ”— godane it was going to be take offline completely before
01:02 πŸ”— godane i grabed the forums anyways cause of this
03:00 πŸ”— yipdw dashcloud: I've had to deal with similar monstrosities in Excel
03:01 πŸ”— yipdw dashcloud: however, I've come to think that maybe the right way to "fix" that is to not eliminate the monstrosity, because it clearly works in some degree; rather, the right fix is to augment the monstrosity (at arm's length) with indexing, backup, query, etc.
03:01 πŸ”— yipdw dashcloud: or maybe that's just applicable to healthcare, I have no idea :P
03:02 πŸ”— dashcloud so, I have that experience because I actually created a VBA script to handle a specific set of tasks in Word that I couldn't find any free program to do
03:04 πŸ”— dashcloud I haven't done programming of any kind in many years, and I've never touched VBA before, yet with the help of "The Internet" and a little knowledge, I cobbled/hacked/etc a reasonably good script that solves the problem
06:47 πŸ”— omf_ "AWS is the overwhelming market share leader, with more than five times the compute capacity in use than the aggregate total of the other fourteen providers in this Magic Quadrant,"
06:47 πŸ”— omf_ included HP, GoGrid, SoftLayer, Fujitsu, Virtustream, Tier 3, and Joyent in the "niche player" section, and Dimension Data, Savvis, and Terremark in the top left "challenger" section.
06:47 πŸ”— omf_ also rackspace and microsoft
06:48 πŸ”— yipdw I can tell that they have "more than five times the compute capacity in use", because EC2 instances are always slow as fuck
06:48 πŸ”— yipdw unfortunately spot instances are cheap as hell so I keep using them :(
06:48 πŸ”— yipdw it's lucky I don't apply the same philosophy to my food, or I'd be poisoning myself at McDonald's
09:03 πŸ”— SmileyG GLaDOS: ?
09:04 πŸ”— SmileyG I can't log in to the box :O
09:04 πŸ”— SmileyG did you break it or ban me? :D
09:06 πŸ”— GLaDOS ..it went down literally just then.
09:07 πŸ”— GLaDOS "This dedicated server ks3099601.kimsufi.com is expired"
09:07 πŸ”— GLaDOS OH LOOK AT THAT, SAME ISSUE AS LAST TIME
09:08 πŸ”— SmileyG ffs
09:08 πŸ”— SmileyG kick off this time :! D:
09:17 πŸ”— GLaDOS "According to The Wall Street Journal, Alibaba.com is going for an IPO with a value of $70 Billion! Could this be an investment opportunity in Yahoo's stock, which holds 23% of Alibaba?"
09:18 πŸ”— GLaDOS Hahaha, worthless stock.
09:18 πŸ”— GLaDOS Also, "It's listed as active in our system. Are you unable to use it?"
09:18 πŸ”— GLaDOS Your system is WRONG.
09:19 πŸ”— GLaDOS Nice response time though.
10:40 πŸ”— godane i'm grabiing groklaw.net
10:43 πŸ”— omf_ Why? According to the cdx search that site has deep coverage going back to when it launched
10:43 πŸ”— ersi Why not?
10:43 πŸ”— ersi Doesn't hurt.
10:44 πŸ”— godane i know may know way i grabbed things by year
10:44 πŸ”— godane i got a read error at byte
10:45 πŸ”— godane then it does one retry and stops downloading
10:45 πŸ”— godane anyway to stop that?
10:50 πŸ”— godane also i need comments to be flat
10:50 πŸ”— godane so i can grab them all
10:55 πŸ”— ersi By any chance, is anyone here capable of building regular expressions (regexps)?
10:55 πŸ”— omf_ if you have sample data
10:55 πŸ”— godane omf_: click on a story here: http://web.archive.org/web/20130102154112/http://groklaw.net/
10:56 πŸ”— godane none of the links are in wayback
10:56 πŸ”— godane there going to the liveweb
10:57 πŸ”— godane omf_: also this will explain things better for you: http://web.archive.org/web/20130102154111/http://groklaw.net/robots.txt
10:58 πŸ”— godane anyways i need some help with how to get flat comments so we can get them all
10:58 πŸ”— ersi I'm trying to write a rewrite/replace rule that will replace the first / occouring in the URI to _ (but not the / root) like: http://hostname/prod/TEXT-234234-HD_3423423_34234/ -> http://hostname/prod_TEXT-234234-HD_3423423_34234/
10:59 πŸ”— ersi omf_: ^
10:59 πŸ”— ersi I tried something like.. "rewrite ^/([^/]*)/([^/]*)$ /$1_$2 break;" - that didn't work though
11:02 πŸ”— omf_ is it always going to be a dir that you are renaming?
11:03 πŸ”— ersi I think so, yes
11:03 πŸ”— ersi might as well update that there, considering the last topic ;D
11:04 πŸ”— ersi even better
11:08 πŸ”— omf_ (/)[\w-]+/$
11:08 πŸ”— omf_ then replace the captured / with _
11:08 πŸ”— godane 2013-08-20 07:07:59 (184 KB/s) - Read error at byte 27074/38322 (Connection reset by peer). Retrying.
11:08 πŸ”— godane fucking stopped again
11:09 πŸ”— godane i need help so when it does this it will keep going
11:12 πŸ”— ersi omf_: I'm not quite sure how I'd use that :o
11:12 πŸ”— omf_ Are you going to use the regex over a file you have?
11:13 πŸ”— ersi I'm using proxy_pass to another application
11:13 πŸ”— ersi Integration work :|
11:14 πŸ”— ersi App expects to get a url like http://hostname/prod_TEXT-234234-HD_3423423_34234/ - while the idiotic portal has links like http://hostname/prod/TEXT-234234-HD_3423423_34234/
11:14 πŸ”— ersi So I'm not doing anything besides rewriting the URL replacing the first instance of / in the URI
11:53 πŸ”— godane can anyone help me?
11:54 πŸ”— godane i need to figure out how to grab all comments off of groklaw.net?
12:03 πŸ”— godane all i know is there post method is used and the site refresh to a article.php
12:33 πŸ”— Tephra WHAT, groklaw is going away?? what's happening to the world
14:37 πŸ”— godane looks like %0D everywhere in my pdf list of groklaw
14:37 πŸ”— godane :-(
14:41 πŸ”— godane it just never dies
14:41 πŸ”— godane HELP
14:41 πŸ”— yipdw godane: it looks like a straightforward spidering will work on groklaw; it doesn't do anything fancy
14:41 πŸ”— yipdw godane: perhaps you're hitting it too fast
14:43 πŸ”— godane ylpdw: i thought that too but then i get a byte error retry
14:43 πŸ”— godane if i get that it will end
14:43 πŸ”— godane so i have to stop the byte error retry
14:43 πŸ”— godane and i don't think a wait will fix it
14:43 πŸ”— godane also remember i have crap wifi
14:49 πŸ”— yipdw huh
14:49 πŸ”— yipdw "The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers."
14:49 πŸ”— yipdw who is Mark?
14:50 πŸ”— yipdw oh
14:50 πŸ”— yipdw Mark Webbink
14:50 πŸ”— yipdw never mind
14:50 πŸ”— godane i put a 0.2 wait on my script
14:51 πŸ”— godane i'm hoping for no byte error crap
14:51 πŸ”— yipdw if that doesn't work, try something with a real line
14:51 πŸ”— yipdw or a VPS, etc.
14:52 πŸ”— balrog there are pages only accessible with an account.
14:52 πŸ”— balrog fyi
14:54 πŸ”— yipdw " Sorry, creation of new accounts has been temporarily disabled "
14:54 πŸ”— yipdw well then
14:54 πŸ”— balrog they had problems with trolls :/
14:54 πŸ”— yipdw do you have one?
14:54 πŸ”— balrog maybe if you email pj she'll create an account for you
14:54 πŸ”— godane http://www.urbanterror.info/news/423-git-repository-hacked/
14:54 πŸ”— balrog I do
14:55 πŸ”— yipdw maybe you should run godane's grab
14:56 πŸ”— godane wget $website --mirror --warc-file=$website-$(date +%Y%m%d) --warc-cdx --reject-regex="(#|comment.php)" --warc-max-size=1G -H --domains=$website -w 0.2 -E -o wget.log
14:56 πŸ”— godane website="groklaw.net"
14:56 πŸ”— yipdw FYI, --mirror doesn't imply --page-requisites
14:57 πŸ”— yipdw you probably want that in there too
14:58 πŸ”— godane anything else before i start running it again
14:58 πŸ”— yipdw actually, I found another possibility
14:58 πŸ”— yipdw maybe we can just email her and ask her if she can donate a copy of all site data to IA
14:59 πŸ”— yipdw I'll do that
14:59 πŸ”— godane ok
14:59 πŸ”— godane still going to see about mirroring it
14:59 πŸ”— yipdw sure
14:59 πŸ”— balrog yipdw: I'm pretty sure she would agree to that
15:00 πŸ”— balrog though not user data obviously
15:07 πŸ”— godane it happened again
15:07 πŸ”— godane byte error
15:08 πŸ”— yipdw balrog: right, we're just looking for public data that's blocked by robots.txt
15:08 πŸ”— yipdw AFAIK, that includes comments
15:08 πŸ”— balrog on groklaw?
15:08 πŸ”— yipdw yeah
15:08 πŸ”— balrog and pdfs
15:08 πŸ”— yipdw unless I misread that robots file
15:08 πŸ”— yipdw yes, those too
15:08 πŸ”— balrog those are important
15:08 πŸ”— balrog well
15:08 πŸ”— balrog having a user account allows you to view all comments on one page
15:08 πŸ”— yipdw I can't find a good PGP public key for her, heh
15:08 πŸ”— balrog email her and ask for it
15:09 πŸ”— yipdw I'm not sure it makes sense to ask for a public key over a channel that's assumed compromised
15:09 πŸ”— yipdw I just won't encrypt this; it's not sensitive (yet)
15:10 πŸ”— yipdw by compromised I mean "someone could take it over"
15:10 πŸ”— balrog I searched the keyserver and found two old 1024bit keys
15:10 πŸ”— balrog which I wouldn't trust
15:10 πŸ”— yipdw I found a few, yeah
15:10 πŸ”— balrog one's expired, the other isn't set to expire
15:11 πŸ”— balrog you can email here mykolab address
15:11 πŸ”— balrog her*
15:11 πŸ”— yipdw yeah, I'm sending to both
15:18 πŸ”— yipdw sent
17:07 πŸ”— omf_ |ω・)
17:14 πŸ”— Smiley :)
17:14 πŸ”— Smiley hahah omf_ well said
17:14 πŸ”— Smiley offical line, back that shit up
17:15 πŸ”— omf_ I am thinking groklaw might "come back" in the future
17:15 πŸ”— balrog I hope so.
17:15 πŸ”— omf_ They were supposed to close a bunch of times before and it didn't happen
17:16 πŸ”— omf_ someone else might create a new site
17:16 πŸ”— Smiley nod
17:16 πŸ”— Smiley but that wont... save the existing stuff
17:16 πŸ”— Smiley not that i ever really follow it, it confused and scares the fuck outta me
17:20 πŸ”— Schbirid having to decide what porn to put on the external: first world single's problem
18:35 πŸ”— joepie91 I'm going to take a break from IRC for a few days, if you really need me contact me on XMPP (joepie91@dukgo.com) or via e-mail (admin@cryto.net)... but only if it's important in some way
18:47 πŸ”— yipdw balrog: I got a response from PJ
18:47 πŸ”— balrog what did she say?
18:47 πŸ”— yipdw balrog: the reason why comments are blocked is because the majority of Groklaw members voted to keep them out of e.g. the LoC
18:47 πŸ”— balrog you can use PM if you want
18:48 πŸ”— balrog or if you have xmpp, I do xmpp with otr
18:48 πŸ”— yipdw oh, you want the whole email?
18:48 πŸ”— yipdw ok
18:48 πŸ”— balrog I don't quite trust irc
18:48 πŸ”— yipdw unverified OTR isn't much better :P
18:48 πŸ”— yipdw but sure
18:48 πŸ”— yipdw I'm yipdw@member.fsf.org on XMPP
20:12 πŸ”— omf_ Commentary on my first ArchiveTeam diagram? --> http://picpaste.com/pics/Pz81z7Mx.1377022875.png
20:12 πŸ”— omf_ What other processes would benefit from a chart like this?
20:41 πŸ”— S[h]O[r]T looks cool
20:41 πŸ”— S[h]O[r]T maybe a start of an archiving project for a big site
20:42 πŸ”— S[h]O[r]T someone says xx is dying need to save, people start investigating. it seems eventually some kind of leader emerges who is writing the code to get it started
20:43 πŸ”— omf_ I have quite a bit of that documented here --> http://pad.archivingyoursh.it/p/atpodcast
20:45 πŸ”— S[h]O[r]T a graph is nice :P
20:46 πŸ”— omf_ Yeah a flow chart would make sense as a format
20:46 πŸ”— omf_ I am putting it on my list
21:39 πŸ”— godane so looks like episodes 293 to 306 of labrats.tv was on rev3
21:39 πŸ”— godane i will add rev3 and revision3 to the keywords of those episodes
21:40 πŸ”— godane the creator will be labrats.tv
21:40 πŸ”— godane here is the itunes files: http://revision3.com/feed/show/labratsland/mp4-large/itunes

irclogger-viewer