Time |
Nickname |
Message |
00:00
🔗
|
McGEE |
BoBeR182: you still here? |
00:03
🔗
|
|
ripvanwin has joined #archiveteam |
00:05
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
00:07
🔗
|
|
JW__ has quit IRC (Quit: Page closed) |
00:08
🔗
|
|
rejon has quit IRC (Ping timeout: 512 seconds) |
00:12
🔗
|
|
_ has quit IRC (Quit: Page closed) |
00:13
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |
00:20
🔗
|
BoBeR182 |
yes McGEE |
00:21
🔗
|
McGEE |
BoBeR182: what's your plan of migration for zinelibrary |
00:21
🔗
|
McGEE |
write one up |
00:21
🔗
|
McGEE |
it's useful |
00:22
🔗
|
BoBeR182 |
Get as many pdfs from google cache and the torrent and wayback machine |
00:22
🔗
|
BoBeR182 |
then get volunteers/myself to tag pdfs by author and type |
00:22
🔗
|
BoBeR182 |
and rehost in a organized fashion |
00:24
🔗
|
|
DFJustin has joined #archiveteam |
00:24
🔗
|
|
swebb sets mode: +o DFJustin |
00:30
🔗
|
|
mistym has joined #archiveteam |
00:53
🔗
|
kniffy |
is there any word on archiving the FPH copy subreddits? |
00:53
🔗
|
kniffy |
some of them are pretty unsavory |
01:10
🔗
|
garyrh |
Yes, a bunch were put into ArchiveBot. |
01:19
🔗
|
|
schbirid2 has joined #archiveteam |
01:21
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
01:22
🔗
|
|
schbirid has joined #archiveteam |
01:23
🔗
|
|
username1 has quit IRC (Read error: Operation timed out) |
01:25
🔗
|
|
john4 has quit IRC (Ping timeout: 240 seconds) |
01:29
🔗
|
|
JesseW has joined #archiveteam |
01:33
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
01:34
🔗
|
|
SmileyG has joined #archiveteam |
01:35
🔗
|
|
Smiley has quit IRC (Read error: Connection reset by peer) |
01:35
🔗
|
|
sb057 has quit IRC (Ping timeout: 362 seconds) |
01:36
🔗
|
|
Mayonaise has joined #archiveteam |
01:36
🔗
|
|
jmc has joined #archiveteam |
01:38
🔗
|
BoBeR182 |
can archive bot work with google cache |
01:45
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:47
🔗
|
McGEE |
BoBeR182: doubt it |
01:47
🔗
|
yipdw |
it works fine |
01:47
🔗
|
McGEE |
oh? |
01:47
🔗
|
yipdw |
just save the cache URL |
01:47
🔗
|
McGEE |
ah ok |
01:47
🔗
|
McGEE |
BoBeR182: did you see that? |
01:47
🔗
|
yipdw |
usually though if the primary source is online you'll want to not use it |
01:47
🔗
|
|
sb057 has joined #archiveteam |
01:47
🔗
|
McGEE |
yeah |
01:48
🔗
|
yipdw |
we don't rewrite or generate proxies for Google cache URLs |
01:48
🔗
|
McGEE |
although primary source is down |
01:48
🔗
|
McGEE |
according to BoBeR182 |
01:48
🔗
|
BoBeR182 |
zinelibrary.info |
01:48
🔗
|
BoBeR182 |
is down |
01:53
🔗
|
|
nico_32 has quit IRC (Ping timeout: 370 seconds) |
02:04
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
02:06
🔗
|
|
Mayonaise has joined #archiveteam |
02:15
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
02:19
🔗
|
|
aaaaaaaaa has quit IRC (Ping timeout: 370 seconds) |
02:34
🔗
|
|
mistym has joined #archiveteam |
02:38
🔗
|
WubTheCap |
Fyi, archive.is rewrites cache URLs |
02:38
🔗
|
WubTheCap |
Also another useful source |
02:52
🔗
|
|
sirdancea has quit IRC (Read error: Operation timed out) |
02:53
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
03:15
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
03:31
🔗
|
|
aaaaaaaaa has joined #archiveteam |
03:31
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
03:32
🔗
|
|
bzc6p_ has joined #archiveteam |
03:32
🔗
|
|
swebb sets mode: +o bzc6p_ |
03:33
🔗
|
|
mistym has joined #archiveteam |
03:37
🔗
|
|
bzc6p has quit IRC (Ping timeout: 600 seconds) |
03:38
🔗
|
|
koshkers has quit IRC (Read error: Connection reset by peer) |
03:43
🔗
|
|
rejon has joined #archiveteam |
03:46
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
03:59
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
04:00
🔗
|
|
JesseW has joined #archiveteam |
04:01
🔗
|
|
Mayonaise has joined #archiveteam |
04:14
🔗
|
|
Ymgve has quit IRC () |
04:17
🔗
|
|
mistym has joined #archiveteam |
04:23
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
04:24
🔗
|
|
closure has quit IRC (Ping timeout: 306 seconds) |
04:24
🔗
|
|
Mayonaise has joined #archiveteam |
04:25
🔗
|
|
closure has joined #archiveteam |
04:25
🔗
|
|
MMovie has quit IRC (Ping timeout: 306 seconds) |
04:27
🔗
|
|
MMovie has joined #archiveteam |
04:27
🔗
|
|
McGEE has quit IRC (Quit: Connection closed for inactivity) |
04:30
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:31
🔗
|
|
nico_32 has joined #archiveteam |
04:32
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
04:32
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
04:33
🔗
|
|
winr4r has quit IRC (Read error: Operation timed out) |
04:33
🔗
|
|
closure has joined #archiveteam |
04:34
🔗
|
|
MMovie2 has joined #archiveteam |
04:34
🔗
|
|
MMovie has quit IRC (Ping timeout: 306 seconds) |
04:34
🔗
|
|
Mayonaise has joined #archiveteam |
04:38
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
04:38
🔗
|
|
closure has joined #archiveteam |
04:39
🔗
|
|
MMovie has joined #archiveteam |
04:40
🔗
|
|
winr4r has joined #archiveteam |
04:40
🔗
|
|
MMovie2 has quit IRC (Ping timeout: 306 seconds) |
04:46
🔗
|
|
MMovie has quit IRC (Read error: Connection reset by peer) |
05:05
🔗
|
|
Boppen has quit IRC (Ping timeout: 198 seconds) |
05:14
🔗
|
|
Boppen has joined #archiveteam |
05:52
🔗
|
|
Asparagir has joined #archiveteam |
05:54
🔗
|
|
MMovie has joined #archiveteam |
06:15
🔗
|
|
bzc6p_ is now known as bzc6p |
06:27
🔗
|
bzc6p |
BoBeR182: from April: |
06:27
🔗
|
bzc6p |
"Help this site get back online. We need rad server space and an established collective to take over all aspects of the site. The founders of this site no longer have the time or effort to put into this project. It is time to pass the tourch to the next generation. If your group is interested, please contact zinelibrary ( at ) riseup.net . Thank you. " |
06:27
🔗
|
bzc6p |
So the shutdown wasn't unexpected |
06:28
🔗
|
bzc6p |
Why don't you (also) contact them? |
06:29
🔗
|
yipdw |
indeed; if they're checking their email it's a much more reliable way to get at the site data than scraping it from cache |
06:29
🔗
|
bzc6p |
If they still have it. |
06:29
🔗
|
yipdw |
yes |
06:30
🔗
|
BoBeR182 |
I emailed them twice no answer |
06:30
🔗
|
|
Asparagir has quit IRC (Quit: Leaving) |
06:31
🔗
|
BoBeR182 |
I will do it again now that its suspended maybe someone will get me a responce |
06:39
🔗
|
bzc6p |
The maintainer seems to have asked for help: http://www.reddit.com/r/Anarchism/comments/1c67ip/ |
06:40
🔗
|
bzc6p |
Look at the numbers. Not a big deal. I don't see why they can't keep it online |
06:41
🔗
|
|
rejon has quit IRC (Ping timeout: 369 seconds) |
06:41
🔗
|
bzc6p |
I guess it's less than $100 per YEAR |
06:43
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
06:45
🔗
|
bzc6p |
Oh, it's a bit older |
06:46
🔗
|
bzc6p |
I mean the reddit |
06:46
🔗
|
bzc6p |
But, still. |
07:01
🔗
|
* |
joepie91 waves at BoBeR182 |
07:12
🔗
|
|
fx_ has quit IRC (Read error: Operation timed out) |
07:15
🔗
|
|
fx_ has joined #archiveteam |
07:19
🔗
|
BoBeR182 |
oh hai |
07:19
🔗
|
* |
BoBeR182 waves back at joepie91 |
07:42
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
07:44
🔗
|
|
mistym has joined #archiveteam |
07:51
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
08:13
🔗
|
* |
SketchCow is up past 1,100 DVDs and CDs that will go on archive.org next week |
08:49
🔗
|
arkiver |
chfoo: can you please take baraza off of projects.json? |
08:49
🔗
|
arkiver |
both grab and discovery, they're done. |
08:50
🔗
|
|
cadbury__ has joined #archiveteam |
08:50
🔗
|
|
cadbury_ has quit IRC (Read error: Connection reset by peer) |
08:52
🔗
|
|
balrog has quit IRC (Ping timeout: 252 seconds) |
08:53
🔗
|
|
chfoo has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
godane has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
kisspunch has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
WubTheCap has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
ruukasu has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
Selanda has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
pwnsrv has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
Gfy has quit IRC (hub.se efnet.portlane.se) |
08:53
🔗
|
|
primus104 has joined #archiveteam |
08:59
🔗
|
|
Atluxity has quit IRC (Ping timeout: 360 seconds) |
09:06
🔗
|
|
Fletcher has quit IRC (Ping timeout: 252 seconds) |
09:12
🔗
|
|
PepsiMax has quit IRC (Read error: Operation timed out) |
09:14
🔗
|
|
balrog has joined #archiveteam |
09:14
🔗
|
|
swebb sets mode: +o balrog |
09:38
🔗
|
|
Fletcher has joined #archiveteam |
09:39
🔗
|
|
primus104 has quit IRC (Leaving.) |
09:47
🔗
|
|
mistym has joined #archiveteam |
09:53
🔗
|
|
mistym has quit IRC (Ping timeout: 252 seconds) |
10:03
🔗
|
|
ruukasu has joined #archiveteam |
10:03
🔗
|
|
godane has joined #archiveteam |
10:03
🔗
|
|
chfoo has joined #archiveteam |
10:03
🔗
|
|
kisspunch has joined #archiveteam |
10:03
🔗
|
|
WubTheCap has joined #archiveteam |
10:03
🔗
|
|
Selanda has joined #archiveteam |
10:03
🔗
|
|
pwnsrv has joined #archiveteam |
10:03
🔗
|
|
Gfy has joined #archiveteam |
10:03
🔗
|
|
efnet.portlane.se sets mode: +o chfoo |
10:06
🔗
|
|
sirdancea has joined #archiveteam |
10:08
🔗
|
|
Fletcher has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Muad-Dib has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
nox has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
_bryan has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
ersi_ has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Ctrl-S has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
russss__ has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
deathy has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
codl_ has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Zebranky has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Stiletto has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
kniffy has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
dugo has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Kazzy has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
lrkj has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Lord_Nigh has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Zero_Dogg has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
lhobas has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
Nemo_bis has quit IRC (hub.se efnet.port80.se) |
10:08
🔗
|
|
johtso has quit IRC (hub.se efnet.port80.se) |
10:12
🔗
|
|
Fletcher has joined #archiveteam |
10:12
🔗
|
|
Muad-Dib has joined #archiveteam |
10:12
🔗
|
|
nox has joined #archiveteam |
10:12
🔗
|
|
_bryan has joined #archiveteam |
10:12
🔗
|
|
ersi_ has joined #archiveteam |
10:12
🔗
|
|
Ctrl-S has joined #archiveteam |
10:12
🔗
|
|
russss__ has joined #archiveteam |
10:12
🔗
|
|
deathy has joined #archiveteam |
10:12
🔗
|
|
codl_ has joined #archiveteam |
10:12
🔗
|
|
Zebranky has joined #archiveteam |
10:12
🔗
|
|
Stiletto has joined #archiveteam |
10:12
🔗
|
|
kniffy has joined #archiveteam |
10:12
🔗
|
|
dugo has joined #archiveteam |
10:12
🔗
|
|
Kazzy has joined #archiveteam |
10:12
🔗
|
|
Zero_Dogg has joined #archiveteam |
10:12
🔗
|
|
lrkj has joined #archiveteam |
10:12
🔗
|
|
Lord_Nigh has joined #archiveteam |
10:12
🔗
|
|
lhobas has joined #archiveteam |
10:12
🔗
|
|
Nemo_bis has joined #archiveteam |
10:12
🔗
|
|
johtso has joined #archiveteam |
10:12
🔗
|
|
efnet.port80.se sets mode: +oo Lord_Nigh Nemo_bis |
10:23
🔗
|
|
ruukasu has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
godane has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
chfoo has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
kisspunch has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
WubTheCap has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
Selanda has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
pwnsrv has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
Gfy has quit IRC (hub.se efnet.portlane.se) |
10:23
🔗
|
|
Kenshin has quit IRC (hub.se irc.efnet.fr) |
10:37
🔗
|
arkiver |
https://textfiles.trovebox.com/ |
10:37
🔗
|
arkiver |
currently being saved in the trovebox project ^ |
10:48
🔗
|
|
ruukasu has joined #archiveteam |
10:48
🔗
|
|
godane has joined #archiveteam |
10:48
🔗
|
|
chfoo has joined #archiveteam |
10:48
🔗
|
|
kisspunch has joined #archiveteam |
10:48
🔗
|
|
WubTheCap has joined #archiveteam |
10:48
🔗
|
|
Selanda has joined #archiveteam |
10:48
🔗
|
|
pwnsrv has joined #archiveteam |
10:48
🔗
|
|
Gfy has joined #archiveteam |
10:48
🔗
|
|
efnet.portlane.se sets mode: +o chfoo |
10:51
🔗
|
|
SN4T14 has joined #archiveteam |
10:52
🔗
|
|
Emcy has joined #archiveteam |
10:55
🔗
|
|
Whisper has joined #archiveteam |
10:56
🔗
|
|
godane has quit IRC (Ping timeout: 265 seconds) |
10:57
🔗
|
|
Rickster has quit IRC (Ping timeout: 512 seconds) |
10:58
🔗
|
|
Atluxity has joined #archiveteam |
10:58
🔗
|
|
Whisper_ has quit IRC (Ping timeout: 512 seconds) |
10:58
🔗
|
|
lysobit has quit IRC (Ping timeout: 512 seconds) |
10:58
🔗
|
|
BlueMaxim has quit IRC (Ping timeout: 512 seconds) |
10:58
🔗
|
|
SN4T14_ has quit IRC (Ping timeout: 512 seconds) |
10:58
🔗
|
|
Famicoman has quit IRC (Ping timeout: 512 seconds) |
10:58
🔗
|
|
xmc has quit IRC (Ping timeout: 512 seconds) |
10:59
🔗
|
|
Emcy_ has quit IRC (Ping timeout: 512 seconds) |
10:59
🔗
|
|
BlueMaxim has joined #archiveteam |
10:59
🔗
|
|
Rickster has joined #archiveteam |
10:59
🔗
|
|
xmc has joined #archiveteam |
10:59
🔗
|
|
swebb sets mode: +o xmc |
10:59
🔗
|
|
lysobit has joined #archiveteam |
11:49
🔗
|
|
mistym has joined #archiveteam |
11:55
🔗
|
schbirid2 |
random dump of game mod things http://mirror-1.crystal-serv.com/ |
12:01
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
12:13
🔗
|
|
McGEE has joined #archiveteam |
12:22
🔗
|
|
Famicoman has joined #archiveteam |
12:31
🔗
|
|
sankin has joined #archiveteam |
12:32
🔗
|
|
Ymgve has joined #archiveteam |
12:47
🔗
|
|
primus104 has joined #archiveteam |
13:10
🔗
|
|
Deewiant has quit IRC (Remote host closed the connection) |
13:10
🔗
|
|
Deewiant has joined #archiveteam |
13:27
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
13:29
🔗
|
|
philpem has joined #archiveteam |
13:44
🔗
|
|
sirdancea has quit IRC (Read error: Operation timed out) |
13:51
🔗
|
|
mistym has joined #archiveteam |
13:58
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
14:26
🔗
|
|
nertzy has joined #archiveteam |
14:27
🔗
|
|
McGEE has quit IRC (Quit: Connection closed for inactivity) |
14:31
🔗
|
|
mistym has joined #archiveteam |
14:36
🔗
|
|
signius has quit IRC (Remote host closed the connection) |
14:38
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
14:53
🔗
|
|
JesseW has joined #archiveteam |
14:57
🔗
|
|
mistym has joined #archiveteam |
14:58
🔗
|
|
primus104 has quit IRC (Leaving.) |
15:05
🔗
|
|
signius has joined #archiveteam |
15:06
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
15:18
🔗
|
|
scyther has joined #archiveteam |
15:18
🔗
|
|
signius has quit IRC (Quit: Leaving) |
15:18
🔗
|
|
signius has joined #archiveteam |
15:20
🔗
|
|
signius has quit IRC (Client Quit) |
15:27
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
15:36
🔗
|
|
signius has joined #archiveteam |
15:41
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
15:42
🔗
|
|
rejon has joined #archiveteam |
15:58
🔗
|
|
signius has quit IRC (Quit: Leaving) |
16:01
🔗
|
|
bzc6p_ has joined #archiveteam |
16:01
🔗
|
|
swebb sets mode: +o bzc6p_ |
16:05
🔗
|
|
bzc6p has quit IRC (Read error: Operation timed out) |
16:08
🔗
|
|
signius has joined #archiveteam |
16:11
🔗
|
|
mistym has joined #archiveteam |
16:18
🔗
|
|
Start has quit IRC (Disconnected.) |
16:32
🔗
|
BoBeR182 |
anyone able to get a script that downloads all the pdfs from zinelibrary.info @ google cache and wayback machine |
16:34
🔗
|
|
rejon has quit IRC (Ping timeout: 369 seconds) |
16:44
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
16:45
🔗
|
|
mistym has joined #archiveteam |
16:50
🔗
|
|
primus104 has joined #archiveteam |
17:04
🔗
|
|
McGEE has joined #archiveteam |
17:04
🔗
|
arkiver |
SketchCow: have you seen my question yesterday about sourceforge? |
17:05
🔗
|
|
aaaaaaaaa has joined #archiveteam |
17:05
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
17:07
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |
17:27
🔗
|
|
primus104 has quit IRC (Leaving.) |
17:28
🔗
|
|
godane has joined #archiveteam |
17:30
🔗
|
|
godane has quit IRC (Client Quit) |
17:30
🔗
|
|
godane has joined #archiveteam |
17:38
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
17:48
🔗
|
|
sirdancea has joined #archiveteam |
17:54
🔗
|
|
rejon has joined #archiveteam |
17:59
🔗
|
|
Ravenloft has joined #archiveteam |
18:01
🔗
|
|
xtr-201 has quit IRC (Read error: Connection reset by peer) |
18:18
🔗
|
|
VADemon has joined #archiveteam |
18:26
🔗
|
|
bzc6p_ is now known as bzc6p |
18:33
🔗
|
bzc6p |
BoBeR182: Well, I don't know how to search for a specific file type in Google, e.g. for pdf |
18:34
🔗
|
bzc6p |
Once you find the way, you can easily export the results like the way described here: http://archiveteam.org/index.php?title=Site_discovery#Google |
18:35
🔗
|
bzc6p |
(For the first: maybe type:pdf ? I don't know, that gives just a few results) |
18:36
🔗
|
DFJustin |
filetype:pdf |
18:38
🔗
|
bzc6p |
Then, according to a stackexchange thread, you'll need to search for the files you have the list of in the google webcache: http://webcache.googleusercontent.com/search?q=cache: |
18:38
🔗
|
bzc6p |
the url goes after that |
18:39
🔗
|
bzc6p |
However, you'll get the HTML version only (is there a way to get a PDF one?). But, surprisingly, there is link on the top to the PDF, hosted on some other site! I don't understand why, but check it out yourself. |
18:42
🔗
|
bzc6p |
The above steps can be quite well (semi)automated. You have the list from the Google scrape, then sed "s/^/webcache.googleusercontent......./ etc., then wget -i <that list> -O - | grep ".pdf\"" | cut -d'"' -f 6 gives you the URLs to the originals, which you can easily download. |
18:42
🔗
|
bzc6p |
As for the Wayback, |
18:43
🔗
|
* |
bzc6p looks up how to get that wayback urllist in csv |
18:43
🔗
|
* |
bzc6p knows he read it somewhere but doesn't remember where |
18:44
🔗
|
|
scyther has joined #archiveteam |
18:52
🔗
|
bzc6p |
F**k it, I'll just download and grep the HTML version. |
18:53
🔗
|
bzc6p |
DFJustin: thanks, it gave 20 times more results |
18:59
🔗
|
bzc6p |
BoBeR182: you must do the Google scrape and whatever, but I've created the list of PDFs available in the Wayback Machine, I'll transfer it to you |
19:00
🔗
|
|
Start has joined #archiveteam |
19:02
🔗
|
|
nertzy has joined #archiveteam |
19:02
🔗
|
bzc6p |
Oh. paste.archivingyoursh.it down |
19:05
🔗
|
arkiver |
yeah, it's been down for quite some time now |
19:07
🔗
|
* |
bzc6p hates to use third party services, that an archivist usually downloads, not upload to, but *sigh* |
19:07
🔗
|
bzc6p |
BoBeR182: so here's the list of PDF available in Wayback Machine (gzipped): http://ddl7.data.hu/get/0/8844654/lista3.gz |
19:16
🔗
|
|
rejon has quit IRC (Ping timeout: 362 seconds) |
19:22
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
19:24
🔗
|
|
Start has quit IRC (Disconnected.) |
19:28
🔗
|
|
rejon has joined #archiveteam |
19:28
🔗
|
SketchCow |
No |
19:29
🔗
|
SketchCow |
(Didn't see your question) |
19:32
🔗
|
|
Start has joined #archiveteam |
19:34
🔗
|
aaaaaaaaa |
SketchCow: he is probably referring to http://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-06-11,Thu&sel=253#l249 |
19:39
🔗
|
SketchCow |
Ah |
19:39
🔗
|
SketchCow |
In 5 years, if someone wanted to recreate a sourceforge project and feel comfortable they could recover all the code they need to recreate any version, what do we need? |
19:39
🔗
|
SketchCow |
That's the question I want answered. |
19:40
🔗
|
bzc6p |
BoBeR182: forgot to say the list I have has 3,820 file,s but you'll se anyway. |
19:41
🔗
|
|
xtr-201 has joined #archiveteam |
19:43
🔗
|
|
nertzy has joined #archiveteam |
19:44
🔗
|
|
primus104 has joined #archiveteam |
19:44
🔗
|
bzc6p |
I've always wondered if a software archive should contain all versions of a software, and may answer would be yes if it wasn't me who has to host that lot times more content. Good thing is code compresses well. |
19:44
🔗
|
bzc6p |
BUT |
19:46
🔗
|
|
bithippo has joined #archiveteam |
19:46
🔗
|
bzc6p |
what arkiver suggests about (3) is more than enough to recreate every revisions, considering that (1) also contains all revisions. |
19:47
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
19:48
🔗
|
* |
bzc6p realizes the question was probably addressed to arkiver specifically |
19:50
🔗
|
|
bithippo has quit IRC (Client Quit) |
19:51
🔗
|
|
nertzy has quit IRC (This computer has gone to sleep) |
20:00
🔗
|
SketchCow |
YEs, it was. |
20:00
🔗
|
SketchCow |
Your energy is refreshing, however. |
20:00
🔗
|
SketchCow |
Aim it at the Wiki |
20:02
🔗
|
Start |
we should get moving on radioshack soon |
20:02
🔗
|
Start |
www.radioshack.com redirects to comingsoon.radioshack.com |
20:03
🔗
|
Start |
all of the content is still there, but there's a banner at the top that says "RADIOSHACK.COM'S ONLINE STORE IS COMING BACK SOON!" |
20:03
🔗
|
Start |
rendering all content on the old site endangered |
20:04
🔗
|
aaaaaaaaa |
I could have sworn radioshack was archivebotted a few times. |
20:08
🔗
|
|
sirdancea has quit IRC (Read error: Operation timed out) |
20:16
🔗
|
|
mr_rippit has joined #archiveteam |
20:16
🔗
|
|
ripvanwin has quit IRC (Read error: Connection reset by peer) |
20:22
🔗
|
|
Start has quit IRC (Disconnected.) |
20:26
🔗
|
|
Start has joined #archiveteam |
20:29
🔗
|
|
lytv has quit IRC (Max SendQ exceeded) |
20:31
🔗
|
SketchCow |
A bunch. |
20:31
🔗
|
SketchCow |
I'm not worried about radioshack |
20:32
🔗
|
|
lytv has joined #archiveteam |
20:32
🔗
|
|
Emcy has quit IRC (Read error: Connection reset by peer) |
20:33
🔗
|
|
Emcy has joined #archiveteam |
20:35
🔗
|
|
Start has quit IRC (Disconnected.) |
20:35
🔗
|
|
mistym has joined #archiveteam |
20:40
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
20:51
🔗
|
|
Start has joined #archiveteam |
20:52
🔗
|
|
sankin has quit IRC (Leaving.) |
20:54
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
20:54
🔗
|
|
Start has joined #archiveteam |
21:16
🔗
|
|
VADemon has quit IRC (ny.us.hub west.us.hub) |
21:16
🔗
|
|
signius has quit IRC (ny.us.hub west.us.hub) |
21:16
🔗
|
|
Famicoman has quit IRC (ny.us.hub west.us.hub) |
21:16
🔗
|
|
xmc has quit IRC (ny.us.hub west.us.hub) |
21:16
🔗
|
|
dan- has quit IRC (ny.us.hub west.us.hub) |
21:19
🔗
|
|
Start has quit IRC (Disconnected.) |
21:22
🔗
|
|
VADemon has joined #archiveteam |
21:22
🔗
|
|
signius has joined #archiveteam |
21:22
🔗
|
|
Famicoman has joined #archiveteam |
21:22
🔗
|
|
xmc has joined #archiveteam |
21:22
🔗
|
|
dan- has joined #archiveteam |
21:22
🔗
|
|
irc.eversible.com sets mode: +o xmc |
21:22
🔗
|
|
swebb sets mode: +o xmc |
21:25
🔗
|
|
rejon has quit IRC (Ping timeout: 362 seconds) |
21:34
🔗
|
|
nertzy has joined #archiveteam |
21:40
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
21:53
🔗
|
|
philpem has joined #archiveteam |
22:00
🔗
|
|
Stiletto is now known as Stilett0 |
22:06
🔗
|
|
nertzy has joined #archiveteam |
22:10
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
22:13
🔗
|
|
sirdancea has joined #archiveteam |
22:19
🔗
|
|
kyan has joined #archiveteam |
22:28
🔗
|
arkiver |
SketchCow: the git file (1) contains all revisions and is the easiest to work with if you want to edit the project and have all revisions |
22:29
🔗
|
arkiver |
however, I also think the sourcecode (2) on sourceforge.net should be saved, for people to easily view it, without having to search for the .git file somewhere on the archive. Also, a lot of people probably don't know how to use a .git file. |
22:30
🔗
|
arkiver |
So git files should definitely be grabbed (but aren't viewable in the wayback machine). |
22:30
🔗
|
SketchCow |
15:39 <@SketchCow> In 5 years, if someone wanted to recreate a sourceforge project and feel comfortable they could recover all the code they need to recreate any version, what do we need? |
22:30
🔗
|
SketchCow |
That is all |
22:30
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
22:31
🔗
|
SketchCow |
However you make that, I trust you |
22:31
🔗
|
SketchCow |
If it takes someone several steps to reconstitute, that's fine |
22:31
🔗
|
arkiver |
SketchCow: then git files will definitely be done. |
22:33
🔗
|
arkiver |
But I'd also like to grab the sourcecode as on sourceforge.net, it is the same as the git file, but viewable in the wayback machine. However, since that's the same as the git file the size will also double (maybe a bit more). Shall we also do the sourcecode on sourceforge? |
22:35
🔗
|
arkiver |
SketchCow ^ both git and sourcecode on sourceforge.net? or only git? |
22:36
🔗
|
SketchCow |
I am fine with doubling. |
22:36
🔗
|
SketchCow |
This is a very unique item in software history. |
22:36
🔗
|
arkiver |
ok, yes! :) |
22:36
🔗
|
arkiver |
Thanks, let's do that! |
22:36
🔗
|
arkiver |
We're going very good on scripts, so should start in a few days (hopefully). |
22:45
🔗
|
|
Start has joined #archiveteam |
22:46
🔗
|
Nemo_bis |
whee 1,100 DVDs and CDs, can't wait for it :) |
22:54
🔗
|
SketchCow |
FOS is at 41% full. Hopeully POMF finishes soon. |
22:58
🔗
|
xmc |
from sourceforge i would like to get all the source control history |
22:59
🔗
|
xmc |
then the downloads, then the bugtrackers and wikis |
22:59
🔗
|
xmc |
somewhere in there also is project homepages |
22:59
🔗
|
xmc |
they have a geocities-oid thingie iirc |
23:05
🔗
|
|
trs80 has quit IRC (Remote host closed the connection) |
23:15
🔗
|
|
TheLQ_ has joined #archiveteam |
23:18
🔗
|
|
TheLQ has quit IRC (Ping timeout: 306 seconds) |
23:24
🔗
|
|
BlueMaxim has joined #archiveteam |
23:29
🔗
|
|
TheLQ_ has quit IRC (Read error: Operation timed out) |
23:50
🔗
|
|
kyan has quit IRC (Quit: This computer has gone to sleep) |
23:51
🔗
|
|
kyan has joined #archiveteam |
23:56
🔗
|
|
kyan has quit IRC (Ping timeout: 258 seconds) |