Time |
Nickname |
Message |
00:13
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
00:29
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
00:42
🔗
|
|
tomwsmf-a has joined #archiveteam |
01:02
🔗
|
|
BlueMaxim has joined #archiveteam |
02:12
🔗
|
|
Coderjoe has quit IRC (Read error: Connection reset by peer) |
02:32
🔗
|
|
Coderjoe has joined #archiveteam |
02:45
🔗
|
|
Start has joined #archiveteam |
03:09
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
03:17
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
04:09
🔗
|
|
RichardG has quit IRC (Ping timeout: 260 seconds) |
04:21
🔗
|
|
JesseW has joined #archiveteam |
04:29
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:35
🔗
|
|
Sk1d has joined #archiveteam |
05:00
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
05:38
🔗
|
|
ariscop has quit IRC (Quit: Leaving) |
05:40
🔗
|
|
ariscop has joined #archiveteam |
06:44
🔗
|
|
RichardG has joined #archiveteam |
07:00
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
07:31
🔗
|
|
Honno has joined #archiveteam |
07:56
🔗
|
|
philpem has joined #archiveteam |
08:19
🔗
|
|
schbirid has joined #archiveteam |
08:23
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
08:36
🔗
|
|
godane has joined #archiveteam |
09:14
🔗
|
|
zhongfu has quit IRC (Remote host closed the connection) |
09:16
🔗
|
|
zhongfu has joined #archiveteam |
09:38
🔗
|
|
n3m0 has joined #archiveteam |
09:38
🔗
|
|
n3m0 is now known as skrp |
09:40
🔗
|
skrp |
can i get a serious pm. 100T+ bsd over zfs system. 2 Million files. 1Million+ books. I've been archiving on my on, but would appreciate a merger |
10:19
🔗
|
|
RichardG has joined #archiveteam |
10:31
🔗
|
bai |
skrp: https://twitter.com/textfiles/status/736270734033575936 |
10:33
🔗
|
|
Tomcat_ has joined #archiveteam |
10:36
🔗
|
|
M-davidar has joined #archiveteam |
10:41
🔗
|
|
M-davidar is now known as davidar_ |
10:53
🔗
|
|
Tomcat__ has joined #archiveteam |
10:53
🔗
|
|
Tomcat__ has quit IRC (Remote host closed the connection!) |
10:53
🔗
|
|
Tomcat_ has quit IRC (Read error: Operation timed out) |
10:58
🔗
|
|
Tomcat_ has joined #archiveteam |
10:58
🔗
|
|
Tomcat_ has quit IRC (Connection closed) |
10:58
🔗
|
|
Tomcat_ has joined #archiveteam |
10:58
🔗
|
|
Tomcat_ has quit IRC (Connection closed) |
10:59
🔗
|
|
Tomcat_ has joined #archiveteam |
11:06
🔗
|
skrp |
bai: :/ its hard to take tweeters seriously |
11:06
🔗
|
skrp |
the days when men were men and birds were birds... |
11:12
🔗
|
skrp |
*didn't mean to sound sexist. if he is a lonely housewife, its accepable |
11:13
🔗
|
ivan |
skrp: what do you want to merge? |
11:15
🔗
|
ivan |
are your books from the libgen torrents? |
11:32
🔗
|
skrp |
ivan: do you already have all the libgen? |
11:33
🔗
|
skrp |
no they are wildcard actions ive done via torrent/http/ftp |
11:35
🔗
|
skrp |
ive been working on a c coded archive system that works over zfs. keeping all the files in one pool, naturally deduplicated as each file is named after its $index~$sha256^$filename@$previous_path |
11:36
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
11:37
🔗
|
skrp |
you add an input source and it extracts recursively everything while maintaining importanta 'metadata' |
11:40
🔗
|
skrp |
ripper -t http -i www.pedrk.com -o /uber_dump --index 010518 #this gives it an ultra transient always updated always deduplicated [zfs dedup is a shod] |
11:48
🔗
|
|
BartoCH has quit IRC (Read error: Connection reset by peer) |
11:48
🔗
|
ivan |
skrp: no, I am lacking libgen |
11:48
🔗
|
ivan |
I've never worried about dedup because I just dump hundreds of TB into google drive :-) |
11:51
🔗
|
|
ivan is now known as ivan` |
11:51
🔗
|
skrp |
ivan: lol i dont trust The Machine. I maintain my own servers with my accouting business funds |
11:51
🔗
|
HCross |
Downloading a set of subreddits and other sites related to the EU Referendum |
11:52
🔗
|
|
ivan has joined #archiveteam |
11:52
🔗
|
skrp |
ivan`: libgen is a very hard to get deal. but if you are willing to deal :D |
11:54
🔗
|
skrp |
im trying to get into a group that shares my same philosophy "Data gets lost. Storage gets cheaper. So get everything now" |
11:56
🔗
|
Sanqui |
i think we're your people, even if some of us vary in scale - i barely own a single terabyte of my own data :) |
11:57
🔗
|
|
BartoCH has joined #archiveteam |
11:57
🔗
|
skrp |
Sanqui: well with one TB you could back up many htmls |
11:58
🔗
|
Sanqui |
absolutely - i help archive sites with the archivebot and keep some private material too. |
11:58
🔗
|
skrp |
the internet is a glass cannon. all universities share the same 'ebsco host' systems which amount to only 200k files. |
11:59
🔗
|
skrp |
once war hits the internet is bye bye. too insecure |
11:59
🔗
|
skrp |
so thats why i call myself noah of my bsd zfs ark haha |
12:00
🔗
|
skrp |
ivan`: the libgen is alot larger than most ppl think, i suspect the russians also stored stenography information in them as well |
12:01
🔗
|
Sanqui |
anyway, if you want to speak to somebody serious about your collection, SketchCow's your guy |
12:05
🔗
|
skrp |
well ill be at bsdcan if anyone else from this group is a demon |
12:25
🔗
|
|
RichardG_ has joined #archiveteam |
12:26
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
12:39
🔗
|
arkiver |
RuTracker project is runnig again. |
12:52
🔗
|
|
ariscop has quit IRC (Ping timeout: 633 seconds) |
13:06
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
13:07
🔗
|
|
Simpbra1 has joined #archiveteam |
13:08
🔗
|
|
Simpbrain has quit IRC (Read error: Connection reset by peer) |
13:18
🔗
|
voltagex |
https://launchpad.net/~voltagex/+archive/ubuntu/wget-lua if anyone needs it - rebuilt wget-lua for newer Ubuntus (helpful for EC2 instances) |
13:36
🔗
|
|
metalcamp has joined #archiveteam |
14:02
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
14:05
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
14:09
🔗
|
arkiver |
New wikis are added to the wikis project: |
14:09
🔗
|
arkiver |
battlestarwiki.org |
14:09
🔗
|
arkiver |
editthis.info |
14:09
🔗
|
arkiver |
gamepedia.com |
14:09
🔗
|
arkiver |
miraheze.org |
14:09
🔗
|
arkiver |
referata.com |
14:09
🔗
|
arkiver |
wiki-site.com |
14:10
🔗
|
arkiver |
The lists are taken from the wikiteam project |
14:10
🔗
|
arkiver |
All external URLs from these sites will be grabbed in the wikis project. |
14:10
🔗
|
luckcolor |
firng up scripts in a sec |
14:10
🔗
|
luckcolor |
*firing |
14:11
🔗
|
arkiver |
Thanks |
14:11
🔗
|
arkiver |
It seems wiki-site.com is currently all failing, but that will be fixed. |
14:12
🔗
|
arkiver |
almost all* |
14:12
🔗
|
luckcolor |
concurrency 4 go! |
14:14
🔗
|
arkiver |
The grab is running since november 2015. November 2016 we will regrab all sites, to fetch new external URLs and fetch changed external URLs. |
15:03
🔗
|
|
RichardG_ is now known as RichardG |
15:23
🔗
|
|
tfgbd has quit IRC (Read error: Connection reset by peer) |
15:48
🔗
|
|
JesseW has joined #archiveteam |
16:30
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
16:42
🔗
|
Sanqui |
arkiver: can I just request more wikis to be added? |
17:00
🔗
|
|
JesseW has joined #archiveteam |
17:33
🔗
|
|
fie has quit IRC (Read error: Operation timed out) |
18:09
🔗
|
|
Zinob has joined #archiveteam |
18:10
🔗
|
Zinob |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
18:10
🔗
|
Zinob |
... oh ok, nvm. Just keep an eye on OpenCores.org? |
18:11
🔗
|
luckcolor |
it's yahoosucks |
18:12
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
18:12
🔗
|
luckcolor |
Zinob: what's the status of that website? |
18:12
🔗
|
|
Zei-Pii has joined #archiveteam |
18:14
🔗
|
Zinob |
luckcolor: It is all fine! But the company that hosts the sites biggest customer ran in to problems. |
18:15
🔗
|
luckcolor |
If you want i can schedule a grab on #archivebot |
18:17
🔗
|
luckcolor |
*Zinob: |
18:17
🔗
|
Zinob |
Either that site will get more love now that they have less to do with the other big company OR it might go belly-up... |
18:17
🔗
|
Zinob |
So yeah.. a grab might be in order. |
18:18
🔗
|
luckcolor |
Ok join the channel, in the topic thet's the link to check the progressi of the crawl |
18:18
🔗
|
Zinob |
I dont realy care for the site personally, but it is for FPGA-dessign what SourceForge is for the GPL compunity |
18:19
🔗
|
luckcolor |
Ok it's going |
18:21
🔗
|
Zinob |
Nice |
18:21
🔗
|
luckcolor |
Let me know if you need anything else :p |
18:21
🔗
|
Zinob |
I usually pester a friend that is in the archive-team but i thought that i could check for my self for once :) |
18:22
🔗
|
Zinob |
Great stuff, Keep the good work up. Your Wikipedia Archives have helped me a few times |
19:18
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
19:36
🔗
|
|
Jeroen52 has quit IRC (Ping timeout: 260 seconds) |
19:56
🔗
|
|
Jeroen52 has joined #archiveteam |
20:08
🔗
|
|
zino has quit IRC (Read error: Operation timed out) |
20:09
🔗
|
|
tomwsmf-a has joined #archiveteam |
20:22
🔗
|
|
tfgbd_znc has joined #archiveteam |
20:27
🔗
|
|
tfgbd_znc has quit IRC (Read error: Connection reset by peer) |
20:41
🔗
|
|
Aranje has joined #archiveteam |
21:17
🔗
|
|
VADemon has joined #archiveteam |
21:31
🔗
|
|
Zei-Pii has quit IRC (Read error: Connection reset by peer) |
21:34
🔗
|
|
maseck has quit IRC (Remote host closed the connection) |
21:40
🔗
|
|
maseck has joined #archiveteam |
22:47
🔗
|
arkiver |
Sanqui: always! |
22:47
🔗
|
arkiver |
For now only mediawikis are supported |
22:47
🔗
|
Sanqui |
is there a formal way, or should I just put them here? |
22:47
🔗
|
arkiver |
What an item looks like: |
22:49
🔗
|
arkiver |
for example, mediawikieu:bulpedia.wikia.com/api.php:bulpedia.wikia.com/wiki/ |
22:49
🔗
|
arkiver |
'eu' in mediawikieu means 'external urls' |
22:50
🔗
|
arkiver |
bulpedia.wikia.com/api.php is the location of the api.php |
22:50
🔗
|
arkiver |
bulpedia.wikia.com/wiki/ is the prefix for the articles |
22:50
🔗
|
arkiver |
for example, the above wiki has an page http://bulpedia.wikia.com/wiki/Jokes |
22:50
🔗
|
arkiver |
so the prefix is bulpedia.wikia.com/wiki/ |
22:51
🔗
|
arkiver |
it is different for different wikis |
22:51
🔗
|
arkiver |
if you have all that, it can be added to the warrior grab |
23:03
🔗
|
|
FalconK has quit IRC (Ping timeout: 260 seconds) |
23:04
🔗
|
|
BlueMaxim has joined #archiveteam |
23:05
🔗
|
Sanqui |
arkiver: yeah, I can get that. |
23:17
🔗
|
|
FalconK has joined #archiveteam |
23:49
🔗
|
|
ariscop has joined #archiveteam |