Time |
Nickname |
Message |
02:26
🔗
|
meMyself |
folks, is there a git repo to pull the same scripts as the vm image runs for a custom machine? |
02:28
🔗
|
omf__ |
https://github.com/ArchiveTeam/warrior-code2 |
02:30
🔗
|
meMyself |
omf_: Thanks |
02:59
🔗
|
ivan` |
https://code.google.com/p/httrack2arc/ via http://forum.httrack.com/readmsg/28483/24652/index.html |
03:10
🔗
|
omf_ |
ivan`, I tried that program out, it failed on multiple httracks I tried. Digging into the code I found the problem in the regex used by LogReader.java |
03:10
🔗
|
omf_ |
They use two really big ones that are far more complicated than necessary |
03:11
🔗
|
ivan` |
ah, too bad |
03:11
🔗
|
omf_ |
And I found their test suite to be laughable |
03:11
🔗
|
omf_ |
https://code.google.com/p/httrack2arc/source/browse/trunk/src/pt/arquivo/httrack2arc/test/model/TestLogEntry.java |
03:12
🔗
|
omf_ |
ivan`, no worries, we have plenty of httrack grabs so a better version of this program will happen |
03:12
🔗
|
omf_ |
Also we just had a ton of projects and not much time for anything else |
03:12
🔗
|
ivan` |
I have about 2000 httracks |
03:13
🔗
|
omf_ |
I have a few hundred gigs including the opensolaris backup. |
03:14
🔗
|
omf_ |
I had found that java converter when looking for a way to use httrack since it does not crash out like wget on some sites and has far more sophisticated configuration options. |
03:19
🔗
|
omf_ |
it should be warc and not arc so much |
03:43
🔗
|
TheArtist |
So I was archiving a few Blogspot sites for my own personal use, and I noticed the wget command on the wiki doesn't backup images |
03:43
🔗
|
DFJustin |
blogspot images are hosted on a different hostname so you'd have to tweak it a bit |
03:44
🔗
|
ivan` |
http://www.httrack.com/ new release |
03:45
🔗
|
TheArtist |
woah |
03:56
🔗
|
ivan` |
this is my function for making a warc without thinking, am I missing anything? function quick-warc { wget --warc-file=$1 --warc-cdx --mirror --page-requisites --no-check-certificate -e robots=off http://$1/ } |
04:02
🔗
|
TheArtist |
Question: has there been any research into actually archiving TV Tropes |
04:02
🔗
|
TheArtist |
I know there's a lone wget command buried on the wiki |
04:03
🔗
|
omf_ |
Empty ChangeLog, NEWS file and one sentence on a website is not a good way to communicate how your software is getting better. I still love httrack though |
11:26
🔗
|
samwyse |
This morning, I'm only getting this on my posterous tracker: |
11:26
🔗
|
samwyse |
Starting GetItemFromTracker for Item |
11:26
🔗
|
samwyse |
No item received. Retrying after 30 seconds... |
11:26
🔗
|
samwyse |
Is everything OK at your end? |
11:26
🔗
|
samwyse |
No item received. Retrying after 30 seconds... |
11:47
🔗
|
ivan` |
samwyse: yeah, there are no more items, unless the things in out get cycled |
11:47
🔗
|
ivan` |
tomorrow there will be a lot of greader items :-) |
11:48
🔗
|
ivan` |
also http://tracker.archiveteam.org/formspring/ |
11:48
🔗
|
ivan` |
holy smokes, 64MB/s |
11:56
🔗
|
godane |
so i'm grabing theesa.com site |
11:57
🔗
|
godane |
there are only ~2400 files there so i think its a bit thin on grabs |
11:59
🔗
|
SmileyG |
-rw-r--r-- 1 tim.bowers games 16M Apr 19 12:07 ./rotavault.ign.com-2013-04-17.cdx |
11:59
🔗
|
SmileyG |
-rw-r--r-- 1 tim.bowers games 7.3G Apr 19 12:07 ./rotavault.ign.com-2013-04-17.warc |
11:59
🔗
|
SmileyG |
-rw-r--r-- 1 tim.bowers games 470M May 31 14:00 ./rotavault.ign.com-2013-04-19.cdx |
11:59
🔗
|
SmileyG |
-rw-r--r-- 1 tim.bowers games 52G May 31 14:01 ./rotavault.ign.com-2013-04-19.warc |
11:59
🔗
|
SmileyG |
Got OOM'ed in the end.... |
12:00
🔗
|
SmileyG |
Pouet still going :) |
12:01
🔗
|
SmileyG |
-rw-r--r-- 1 tim.bowers games 38G Jun 3 13:00 ./bin/ign/storage/pouet/pouet.net_06052013.warc |
12:14
🔗
|
godane |
now this is very funny |
12:15
🔗
|
godane |
there is a file called PulsePiracy.mpg |
12:15
🔗
|
godane |
turns out that its a g4 segment from the show called Pulse |
19:41
🔗
|
Cowering |
fuck vbox.. should have just leeched newest vmware workstation |
19:42
🔗
|
Cowering |
now i gotta redo my pristine vm since it has fucking vbox drivers inside |
19:43
🔗
|
Cowering |
(oops, wrong channel!) |
21:52
🔗
|
Shicky256 |
Hi |
21:52
🔗
|
Shicky256 |
I have a quick question |
21:52
🔗
|
SmileyG |
fire away Shicky256 |
21:52
🔗
|
SmileyG |
Someone will answer if they can |
21:53
🔗
|
Shicky256 |
Why does Warrior say that there's no item received? |
21:53
🔗
|
Shicky256 |
is the tracker downa. |
21:53
🔗
|
SmileyG |
Shicky256: what project are you running? |
21:53
🔗
|
ivan` |
because there are no more items for posterous right now |
21:54
🔗
|
Shicky256 |
I tried URLTeam as well, but it said no tasks available |
21:54
🔗
|
ivan` |
http://tracker.archiveteam.org/formspring/ has a lot of items |
21:55
🔗
|
SmileyG |
yeah Formspring is the only active project atm |
21:55
🔗
|
Shicky256 |
then why is posterous recommended instead of that? |
21:55
🔗
|
SmileyG |
URLTeam will return once it's swapped over to the new guys running it |
21:55
🔗
|
SmileyG |
Shicky256: because alard isn't around to change it atm |
21:55
🔗
|
Shicky256 |
Cool |
21:55
🔗
|
SmileyG |
And I don't have access to the tracker to see how it's done :D |
21:55
🔗
|
ivan` |
who else but alard can set the warrior priority? |
21:55
🔗
|
SmileyG |
my guesses are ersi..... and thats it |
21:56
🔗
|
SmileyG |
I don't know who else has tracker access from commandline. |
21:56
🔗
|
ivan` |
underscor |
21:56
🔗
|
Shicky256 |
What happened to the whole Formspring thing anyway? didn't it close over a month ago? |
21:56
🔗
|
SmileyG |
alard: ping when your around. |
21:56
🔗
|
underscor |
hmm |
21:56
🔗
|
SmileyG |
Shicky256: they got someone to buy it appently, however as you might well guess, that can mean *anything* |
21:56
🔗
|
underscor |
I don't know which redis key it is |
21:56
🔗
|
SmileyG |
so we still grabbing it, just in case :) |
21:56
🔗
|
underscor |
If someone knows, I have shell on the box |
21:56
🔗
|
Shicky256 |
lets hope it isn't yahoo |
21:57
🔗
|
SmileyG |
Shicky256: hahaha I said that ;) |
21:58
🔗
|
Shicky256 |
Seriously, yahoo closes everything. I give tumblr a year. |
21:59
🔗
|
Marcelo |
Poor tumblr |
21:59
🔗
|
Shicky256 |
well, gotta go. |
22:02
🔗
|
SmileyG |
Marcelo: they will combine flickr and tumblr into some kind of mega product offering |
22:02
🔗
|
SmileyG |
I give it ..... 2 years |
22:02
🔗
|
SmileyG |
then eventually it'll all close |
22:03
🔗
|
ivan` |
underscor: maybe warriorhq:projects_json, not sure if it's there, http://warriorhq.archiveteam.org/ is not responding for me |
22:04
🔗
|
SmileyG |
we really need to document how to do things like this D: |
22:04
🔗
|
Marcelo |
what? |
22:05
🔗
|
SmileyG |
It'd be epic if we could turn on xanga again, and add new users to it as we go along. |
22:05
🔗
|
ivan` |
underscor: see also warrior-hq/set-projects-json.rb |
22:36
🔗
|
SmileyG |
the rotavault warc's are going up now :D |