Time |
Nickname |
Message |
06:21
🔗
|
Vito`` |
hi, just learned that Fileplanet was being shut down and archived |
06:22
🔗
|
Vito`` |
I used to help run Polycount, which used to have all their 3D models hosted there |
06:23
🔗
|
Vito`` |
is the best way to find all the files they lost by going through the metadata of the tars? |
06:23
🔗
|
Vito`` |
the wiki says all the data on the page is otherwise outdated |
08:23
🔗
|
godane |
good think i backed stillflying.net up: http://fireflyfans.net/mthread.aspx?bid=2&tid=53804 |
09:25
🔗
|
schbirid1 |
Vito``: #fireplanet :) |
09:25
🔗
|
schbirid1 |
Vito``: we have not uploaded much yet |
09:25
🔗
|
schbirid1 |
we will have a nice interface some day |
09:25
🔗
|
schbirid1 |
but actually not for the polycount stuff (because that is from the older planet* hostisng and people put private files up their spaces |
09:25
🔗
|
schbirid1 |
we got ALL the files so we cannot publish that |
09:26
🔗
|
schbirid1 |
i am trying to host it so that if you know a path, you can download it (no public index) |
09:26
🔗
|
schbirid1 |
that should prevent privacy issues |
09:30
🔗
|
schbirid1 |
Vito``: i have the whole planetquake stuff locally on my machine,so if you need a specific file, just shout |
09:30
🔗
|
schbirid1 |
i thought the models were mirrored by others already though, eg leileilol |
09:51
🔗
|
Vito`` |
schbirid1: if I compiled a list of paths you have them locally |
09:51
🔗
|
Vito`` |
? |
09:53
🔗
|
schbirid1 |
yeah |
11:44
🔗
|
hiker1 |
http://www.familyguyonline.com/ is shutting down Jan. 18. Might be worth grabbing whatever is one the site now, to remember the game. |
11:44
🔗
|
hiker1 |
they will probably redirect the domain eventually. |
15:05
🔗
|
no2pencil |
Merry Christmas Archivers!! |
15:31
🔗
|
hiker1 |
Is there a tutorial somewhere on how to use wget for different sites? |
15:37
🔗
|
Nemo_bis |
hiker1: there are wget examples on the page of many services on our wiki |
15:38
🔗
|
hiker1 |
What do you mean? |
15:38
🔗
|
hiker1 |
Can you give me an example? |
15:45
🔗
|
tef |
wget -r -nH np -Amp3 --cut-dirs=1 http://foo.com/~tef/filez |
15:46
🔗
|
tef |
makes a directory 'filez' with all the mp3s it found |
15:46
🔗
|
tef |
-r - recursive, follow links |
15:46
🔗
|
tef |
-nH - don't make a directory for the host (foo.com) |
15:46
🔗
|
tef |
-np - don't go to a parent directory |
15:46
🔗
|
tef |
--cut-dirs=1 strip '~tef' from the path |
15:46
🔗
|
tef |
-Amp3 - only save mp3s |
15:48
🔗
|
hiker1 |
That doesn't use warc output. |
15:50
🔗
|
Deewiant |
http://www.archiveteam.org/index.php?title=Wget_with_WARC_output#Usage |
15:50
🔗
|
tef |
oh |
15:51
🔗
|
hiker1 |
I was using like rewrite urls |
15:51
🔗
|
hiker1 |
and some other commands |
15:51
🔗
|
hiker1 |
It varies so much by website |
15:54
🔗
|
hiker1 |
and to download all the sites prerequisites |
17:54
🔗
|
hiker1 |
Is it possible to append to a warc file? |
17:55
🔗
|
hiker1 |
or append to a wget mirror? |
17:55
🔗
|
hiker1 |
The site I mirrored apparently uses a subdomain, but I used the --no-parent argument. |
17:57
🔗
|
hiker1 |
I also used --convert-links, but it did not convert links to the subdomain. |
19:09
🔗
|
schbirid1 |
hiker2: from what i know, no |
19:09
🔗
|
schbirid1 |
you can use -c but iirc it does not work too well with -m usualy |
19:10
🔗
|
hiker2 |
Someone in here mentioned they grabbed all the urls from a site before actually downloading the site. Is this possible? useful? |
19:21
🔗
|
schbirid1 |
depends on the website |
19:21
🔗
|
schbirid1 |
you can use --spider |
19:21
🔗
|
schbirid1 |
BUT that will download, just not store |
19:21
🔗
|
hiker2 |
When would that be useful? |
19:21
🔗
|
schbirid1 |
if you have no space and want to find out about the site structure |
19:22
🔗
|
schbirid1 |
or if you are just interested in the URLs, not the data |
19:22
🔗
|
hiker2 |
It seems that since wget has no way to continue warc downloads, it would be useful to create a program that does. |
19:22
🔗
|
hiker2 |
*can |
19:23
🔗
|
hiker2 |
wget doesn't seem particularly well-suited to download complete mirrors of websites. |
19:32
🔗
|
schbirid1 |
it could be better for sure |
19:32
🔗
|
schbirid1 |
also eats memory :( |
19:32
🔗
|
schbirid1 |
there is heretix which archive.org uses but i never tried that |
19:32
🔗
|
hiker2 |
httrack as well |
19:32
🔗
|
hiker2 |
but I don't think it supports WARC |
19:33
🔗
|
schbirid1 |
i have had awful results with httrack |
19:34
🔗
|
hiker2 |
someone wrote http://code.google.com/p/httrack2arc/ |
19:34
🔗
|
hiker2 |
which converts httrack to ARC format |
19:34
🔗
|
hiker2 |
When I used HTTrack it worked for what I needed. |
19:34
🔗
|
hiker2 |
I think it resumes too |
20:04
🔗
|
ersi |
Too bad it's running on a retarded operating system with a crappy file system that's case insensitive |
20:06
🔗
|
schbirid1 |
httrack is on linux too |
20:07
🔗
|
ersi |
Huh, didn't know that |
20:21
🔗
|
SketchCow |
MERRY CHRISTMAS ARCHIVE TEAM |
20:21
🔗
|
SketchCow |
JESUS SAVES AND SO DO WE |
20:23
🔗
|
SmileyG |
\o/ |
20:35
🔗
|
ersi |
http://i.imgur.com/Jek9D.jpg |
21:07
🔗
|
rubita |
http://www.carolinaherrera.com/212/es/areyouonthelist?share=2zkuHzwOxvy930fvZN7HOVc97XE-GNOL1fzysCqIoynkz4rz3EUUdzs6j6FXsjB4447F-isvxjqkXd4Qey2GHw#teaser |
21:14
🔗
|
rubita |
http://www.carolinaherrera.com/212/es/areyouonthelist?share=XTv1etZcVd-19S-VT5m1-oIXWSwtlJ3dj4ARKTLVwK7kz4rz3EUUdzs6j6FXsjB4447F-isvxjqkXd4Qey2GHw#episodio-1 |
21:24
🔗
|
SketchCow |
BUT MY EXPENSIVE UNWANTED THING |
21:25
🔗
|
chronomex |
I like how the first thing to load on that page is a php error |
22:44
🔗
|
tef |
heretrix isn't that good :v |
22:59
🔗
|
SketchCow |
what, in general? |
23:01
🔗
|
ersi |
I guess in the context of the earlier conversation, ie for a random-person-grab-site-expedition |
23:10
🔗
|
tef |
SketchCow: well, it's a million lines of code, kinda interweaved. it sorta does the job though |
23:10
🔗
|
tef |
my impression from picking through it trying to find out the format ideosynchrasies of ARC made me unhappy |
23:11
🔗
|
tef |
at work we use something like phantomjs + mitmproxy to dump warcs. |
23:17
🔗
|
tef |
don't get me wrong, i haven't had to use it in anger, but wget should perform just as well, considering it likely has very similar crawling logic |
23:20
🔗
|
hiker2 |
Is there a way to get wget to download external images? |
23:20
🔗
|
hiker2 |
like from tinypic. |