#archiveteam 2013-06-03,Mon

↑back Search

Time Nickname Message
02:26 🔗 meMyself folks, is there a git repo to pull the same scripts as the vm image runs for a custom machine?
02:28 🔗 omf__ https://github.com/ArchiveTeam/warrior-code2
02:30 🔗 meMyself omf_: Thanks
02:59 🔗 ivan` https://code.google.com/p/httrack2arc/ via http://forum.httrack.com/readmsg/28483/24652/index.html
03:10 🔗 omf_ ivan`, I tried that program out, it failed on multiple httracks I tried. Digging into the code I found the problem in the regex used by LogReader.java
03:10 🔗 omf_ They use two really big ones that are far more complicated than necessary
03:11 🔗 ivan` ah, too bad
03:11 🔗 omf_ And I found their test suite to be laughable
03:11 🔗 omf_ https://code.google.com/p/httrack2arc/source/browse/trunk/src/pt/arquivo/httrack2arc/test/model/TestLogEntry.java
03:12 🔗 omf_ ivan`, no worries, we have plenty of httrack grabs so a better version of this program will happen
03:12 🔗 omf_ Also we just had a ton of projects and not much time for anything else
03:12 🔗 ivan` I have about 2000 httracks
03:13 🔗 omf_ I have a few hundred gigs including the opensolaris backup.
03:14 🔗 omf_ I had found that java converter when looking for a way to use httrack since it does not crash out like wget on some sites and has far more sophisticated configuration options.
03:19 🔗 omf_ it should be warc and not arc so much
03:43 🔗 TheArtist So I was archiving a few Blogspot sites for my own personal use, and I noticed the wget command on the wiki doesn't backup images
03:43 🔗 DFJustin blogspot images are hosted on a different hostname so you'd have to tweak it a bit
03:44 🔗 ivan` http://www.httrack.com/ new release
03:45 🔗 TheArtist woah
03:56 🔗 ivan` this is my function for making a warc without thinking, am I missing anything? function quick-warc { wget --warc-file=$1 --warc-cdx --mirror --page-requisites --no-check-certificate -e robots=off http://$1/ }
04:02 🔗 TheArtist Question: has there been any research into actually archiving TV Tropes
04:02 🔗 TheArtist I know there's a lone wget command buried on the wiki
04:03 🔗 omf_ Empty ChangeLog, NEWS file and one sentence on a website is not a good way to communicate how your software is getting better. I still love httrack though
11:26 🔗 samwyse This morning, I'm only getting this on my posterous tracker:
11:26 🔗 samwyse Starting GetItemFromTracker for Item
11:26 🔗 samwyse No item received. Retrying after 30 seconds...
11:26 🔗 samwyse Is everything OK at your end?
11:26 🔗 samwyse No item received. Retrying after 30 seconds...
11:47 🔗 ivan` samwyse: yeah, there are no more items, unless the things in out get cycled
11:47 🔗 ivan` tomorrow there will be a lot of greader items :-)
11:48 🔗 ivan` also http://tracker.archiveteam.org/formspring/
11:48 🔗 ivan` holy smokes, 64MB/s
11:56 🔗 godane so i'm grabing theesa.com site
11:57 🔗 godane there are only ~2400 files there so i think its a bit thin on grabs
11:59 🔗 SmileyG -rw-r--r-- 1 tim.bowers games 16M Apr 19 12:07 ./rotavault.ign.com-2013-04-17.cdx
11:59 🔗 SmileyG -rw-r--r-- 1 tim.bowers games 7.3G Apr 19 12:07 ./rotavault.ign.com-2013-04-17.warc
11:59 🔗 SmileyG -rw-r--r-- 1 tim.bowers games 470M May 31 14:00 ./rotavault.ign.com-2013-04-19.cdx
11:59 🔗 SmileyG -rw-r--r-- 1 tim.bowers games 52G May 31 14:01 ./rotavault.ign.com-2013-04-19.warc
11:59 🔗 SmileyG Got OOM'ed in the end....
12:00 🔗 SmileyG Pouet still going :)
12:01 🔗 SmileyG -rw-r--r-- 1 tim.bowers games 38G Jun 3 13:00 ./bin/ign/storage/pouet/pouet.net_06052013.warc
12:14 🔗 godane now this is very funny
12:15 🔗 godane there is a file called PulsePiracy.mpg
12:15 🔗 godane turns out that its a g4 segment from the show called Pulse
19:41 🔗 Cowering fuck vbox.. should have just leeched newest vmware workstation
19:42 🔗 Cowering now i gotta redo my pristine vm since it has fucking vbox drivers inside
19:43 🔗 Cowering (oops, wrong channel!)
21:52 🔗 Shicky256 Hi
21:52 🔗 Shicky256 I have a quick question
21:52 🔗 SmileyG fire away Shicky256
21:52 🔗 SmileyG Someone will answer if they can
21:53 🔗 Shicky256 Why does Warrior say that there's no item received?
21:53 🔗 Shicky256 is the tracker downa.
21:53 🔗 SmileyG Shicky256: what project are you running?
21:53 🔗 ivan` because there are no more items for posterous right now
21:54 🔗 Shicky256 I tried URLTeam as well, but it said no tasks available
21:54 🔗 ivan` http://tracker.archiveteam.org/formspring/ has a lot of items
21:55 🔗 SmileyG yeah Formspring is the only active project atm
21:55 🔗 Shicky256 then why is posterous recommended instead of that?
21:55 🔗 SmileyG URLTeam will return once it's swapped over to the new guys running it
21:55 🔗 SmileyG Shicky256: because alard isn't around to change it atm
21:55 🔗 Shicky256 Cool
21:55 🔗 SmileyG And I don't have access to the tracker to see how it's done :D
21:55 🔗 ivan` who else but alard can set the warrior priority?
21:55 🔗 SmileyG my guesses are ersi..... and thats it
21:56 🔗 SmileyG I don't know who else has tracker access from commandline.
21:56 🔗 ivan` underscor
21:56 🔗 Shicky256 What happened to the whole Formspring thing anyway? didn't it close over a month ago?
21:56 🔗 SmileyG alard: ping when your around.
21:56 🔗 underscor hmm
21:56 🔗 SmileyG Shicky256: they got someone to buy it appently, however as you might well guess, that can mean *anything*
21:56 🔗 underscor I don't know which redis key it is
21:56 🔗 SmileyG so we still grabbing it, just in case :)
21:56 🔗 underscor If someone knows, I have shell on the box
21:56 🔗 Shicky256 lets hope it isn't yahoo
21:57 🔗 SmileyG Shicky256: hahaha I said that ;)
21:58 🔗 Shicky256 Seriously, yahoo closes everything. I give tumblr a year.
21:59 🔗 Marcelo Poor tumblr
21:59 🔗 Shicky256 well, gotta go.
22:02 🔗 SmileyG Marcelo: they will combine flickr and tumblr into some kind of mega product offering
22:02 🔗 SmileyG I give it ..... 2 years
22:02 🔗 SmileyG then eventually it'll all close
22:03 🔗 ivan` underscor: maybe warriorhq:projects_json, not sure if it's there, http://warriorhq.archiveteam.org/ is not responding for me
22:04 🔗 SmileyG we really need to document how to do things like this D:
22:04 🔗 Marcelo what?
22:05 🔗 SmileyG It'd be epic if we could turn on xanga again, and add new users to it as we go along.
22:05 🔗 ivan` underscor: see also warrior-hq/set-projects-json.rb
22:36 🔗 SmileyG the rotavault warc's are going up now :D

irclogger-viewer